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ABSTRACT 

This  report  contains  a  detailed  study  of  random  correlation  matrices,  including  algebraic, 
statistical,  and  historical  background.  Such  matrices  are  of  particular  interest  because  they 
serve  to  model  ‘average  signals’  for  simulation  testing  of  signal  processing  algorithms.  The 
latter  half  of  this  report  extensively  discusses  the  statistical  behavior  of  spectral  functions 
of  the  two  major  types  of  random  correlation  matrices,  from  both  theoretical  and 
empirical  aspects.  Our  emphasis  then,  is  on  eigenvalue  distribution  and  condition  number 
behavior.  Actual  application  to  algorithm  testing  will  be  described  in  a  subsequent  report. 
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ON  RANDOM  CORRELATION  MATRICES 

1.  INTRODUCTION 

This  report  is  based  on  a  study  of  the  relative  efficacy  of  certain  (group-theoretic)  data  trans¬ 
forms  for  various  canonical  signal  processing  tasks.  Two  such  tasks  are,  in  particular,  data  com¬ 
pression  and  decorrelation.  For  a  given  data  transform,  realized  as  a  unitary  matrix  U,  the  extent 
of  such  activity  can  be  measured  from  the  transformed  data  covariance  matrix.  Thus  if  a  data 
vector  X  has  covariance  C,  its  transform  Ux  has  covariance  D  =  UCU*,  and  the  data  compression 
(respectively  decorrelation)  efficiency  of  the  transform  U  can  be  assessed  by  examination  of  the 
diagonal  (respectively  off-diagonal)  entries  of  D. 

In  order  to  make  a  serious  statistical  study  of  the  efficiency  of  group  transforms  and  filters 
for  the  various  signal  processing  tasks,  it  is  necessary  to  have  an  assortment  of  standardized  sig¬ 
nal  models.  These  fall  into  two  classes:  parametric  models  and  ‘purely  random’  models.  The 
former  determine  after  sampling  structured  covariance  matrices  with  entries  having  a  simple 
dependence  on  a  few  parameters.  The  simplest  and  most  familiar  example  of  this  model  type  is 
the  first-order  Markov  or  auto-regressive  signal  model,  from  which  N  samples  generate  the  covar¬ 
iance  matrix  where  0  <  |p|  <  1,  and  1  ^  i,  j  ^  N.  It  is  somewhat  less  clear,  a  priori,  what 

a  ‘purely  random'  covariance  structure  might  be.  The  object  of  the  following  sections  is  to  clarifiy 
and  discuss  the  term  ‘purely  random’.  Speaking  intuitively  for  the  moment,  this  term  must  be 
precisely  defined  if  we  are  to  seriously  simulate  the  action  of  the  various  transforms,  and  to  even¬ 
tually  say  that  one  or  another  of  them,  for  fixed  data  dimension,  is  superior  in  the  performance 
of  a  particular  task  ‘on  the  average.’ 

1.1  Definitions 

In  the  background  we  have  an  N-dimensional  real  or  complex-valued  second  order  random 
vector  X.  We  will  usually  assume  that  x  has  0-mean:  E(x)  =  0,  the  zero  vector.  The  covariance 
matrix  of  x  is  the  N  X  N  matrix  Cjj  defined  by 

Cx  =  [EfXjXj)] 

Such  matrices  are  characterized  as  being  Hermitian  and  positive  semidefinite.  We  will  in  fact 
always  assume  that  is  actually  positive  definite,  so  as  to  eliminate  degenerate  probability  den¬ 
sity  functions.  Hence  the  eigenvalues  |A|,  .  .  ,  of  are  all  positive;  they  constitute  the  spec¬ 

trum  a(C,)  of  C,,  and  their  relative  size  will  always  be  indicated  bv  subscript; 

We  recall  the  statistical  significance  of  these  eigenvalues:  letting  {d)|,  .  .  .  ,  </»n}  be  the  ortho¬ 
normal  set  of  eigenvectors  corresponding  to  X|,  .  .  .  ,  X^,  we  have 

(a)  Xj  =  var« 

(b)  tr(C^)  =  X,  +  .  .  .  +  Xn  =  E(llx||2) 

(c)  Xn,+  |  +  .  .  .  +  Xfj  =  min  E(d(x,Sn,)2),  m  =  1 . N  -  1 

Sfn 
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The  assertion  here  is  that  X.;  is  the  variance  of  the  ith  principal  component  of  x;  these  random 
variables  occur  as  the  coefficients  in  the  expansion  of  x  in  the  (Karhounen-Loeve)  basis 
d)i,  .  .  .  ,  0^  .  Statement  (b)  is  a  special  case  of  (c)  (take  m  =  0,  there).  The  final  assertion  is 
that  the  best  mean  square  approximation  to  x  by  m-dimensional  subspaces,  S,„,  occurs  when 
is  spanned  by  d>i . error  as  the  indicated  function  of  the  eigenvalues.  For  appli¬ 

cations  of  these  and  related  formulas  to  multivariate  statistics,  pattern  recognition,  and  signal 
processing  (estimating  x  from  noisy  observations),  see,  respectively  References  1,  2,  and  3. 

From  now  on  we  will  make  a  slight  specialization  by  assuming  that  all  components  of  the 
random  vector  x  have  the  same  variance,  which  we  take  to  be  unity.  It  follows  that  the  diagonal 
of  consists  of  ones,  tr(C^)  =  N,  and  the  modulus  of  each  off-diagonal  entry  Cy  satisfies 
Icjjl  <  1.  These  entries  are,  in  fact,  the  correlation  coefficients  of  the  i  and  j  components  of  x. 

Any  such  matrix  is  called  a  correlation  matrix,  and  will  be  our  primary  object  of  study.  Bounds 
and  estimates  for  various  quantities  associated  with  such  matrices  are  reviewed  in  Section  3.  Here 
we  note  that  if  C  is  any  N  x  N  correlation  matrix,  then  K  ||C||  =  X|^  X|  +  .  .  .  +  "  N,  so 

that  the  set  r(N)  of  all  such  C  is  a  compact  convex  subset  of  the  N(N  +  l)/2  -dimensional  real 
space  of  N  X  N  Hermitian  matrices.  (If  the  scalars  are  complex,  this  latter  space  is  of  real  dimen¬ 
sion  N^.) 

In  general,  it  is  difficult  to  tell  by  inspection  whether  a  given  symmetric  or  Hermitian  matrix 
C  with  diagonal  entries  equal  to  one  is  positive  definite,  and  hence  a  correlation  matrix.  Several 
nonlinear  inequalities  involving  the  off-diagonal  entries  must  be  satisfied;  these  correspond  to  the 
positivity  of  the  leading  principal  minors  of  C.  For  example,  given  real  numbers  b,  c,  d  in  (-1,1), 
the  matrix 


•  1  b  c- 

b  1  d 

.  c  d  1 . 

is  a  correlation  matrix  if  and  only  if 
b^  +  +  d^  <  I  +  2bcd. 


Two  simple  sufficient  conditions  for  positive  definiteness  are  available,  however; 

(a)  C  is  diagonally  dominant,  so  that  the  Gershgorin  theorem  can  be  applied,  and 

(b)  C  can  be  partitioned  as 


C  = 


ll  F 
F*  l2 


where  the  I’s  are  identity  matrices,  and  F  is  a  matrix  whose  (spectral)  norm  is  less  than  one. 
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1.2  Notions  of  Randomness 


We  now  want  to  address  the  question  of  randomly  selecting  a  correlation  matrix  of  some 
fixed  size.  Our  particular  interest  in  this  question  has  already  been  indicated  in  the  introductory 
remarks  above,  and  further  motivation  will  be  provided  in  the  next  section;  in  general  we  may 
say  simply  that  a  satisfactory  answer  to  this  question  will  permit  generation  of  random  test  prob¬ 
lems  for  a  variety  of  statistical  methods. 

Roughly  speaking,  any  method  of  generating  random  correlation  matrices  will  begin  by 
generating  some  number  of  pseudorandom  variates  uniformly  distributed  on  the  unit  interval,  and 
then  performing  certain  deterministic  mathematical  steps  to  arrive  at  a  correlation  matrix.  Four 
possible  such  methods  will  be  described  below,  and  two  will  be  discussed  at  some  length.  But  we 
have  to  acknowledge  at  the  outset  that  no  method  is  completely  satisfactory.  This  is  due  to  the 
lack  of  structure  of  the  set  HN)  on  the  one  hand,  and  to  the  presence  of  structure  in  the  indi¬ 
vidual  members  of  r(N)  on  the  other  hand.  That  is,  each  element  C  of  TfN)  has  associated  with 
it,  as  a  matrix,  entries,  eigenvalues,  and  functions  of  these,  such  as  norm,  condition  number,  etc., 
all  of  which  become  random  variables  with  their  own  distributions  which  naturally  depend  on  the 
manner  in  which  C  was  produced.  But  the  set  TfN)  does  not  carry  a  natural  invariant  measure. 
This  deficiency  may  be  contrasted  with  the  cases  of  the  orthogonal  or  unitary  groups,  which 
carry  a  canonical  (unimodular)  Haar  measure.  Nor  is  r(N)  a  homogeneous  space,  such  as  a 
sphere,  on  which  a  transformation  group  acts  and  leaves  invariant  some  measure.  Thus,  while 
such  phrases  as  ‘random  orthogonal  matrix’  and  ‘uniform  distribution  over  the  unit  sphere  S^-l, 
have  a  clear  conceptual  meaning,  and  indeed  there  exist  successful  numerical  procedures  for 
generating  such  variates  (see  in  particular  References  4  and  5  for  the  former  case),  the  situation 
remains  murky  for  correlation  matrices. 

As  a  brief  aside  we  offer  two  remarks.  First,  the  topics  of  approximating  and  efficiently 
sampling  from  the  uniform  (Haar)  distribution  on  finite  or  compact  groups  persist  under  current 
investigation.  In  addition  to  the  references  just  given  for  the  case  of  the  orthogonal  group,  see 
recent  articles  by  L.  Takacs*  for  finite  groups,  and  by  P.  Diaconis  and  various  coauthors  (Refer¬ 
ence  7  and  its  references)  for  an  assortment  of  groups  and  applications.  The  basic  approximation 
result,  that  the  distributions  of  the  successive  terms  in  a  random  walk  on  a  compact  group  con¬ 
verge  vaguely  to  normalized  Haar  measure  provided  that  the  support  of  the  common  distribution 
of  the  individual  terms  is  sufficiently  diffuse,  goes  back  at  least  to  Grenander.*^  The  condition  on 
the  support  of  the  distribution  can  also  be  rephrased  as  a  spectral  property  of  its  (operator¬ 
valued)  Fourier  transform.  Second,  although  as  noted  above.  r(N)  is  not  a  homogeneous  space; 
the  cone  P(N)  of  all  positive  definite  N  x  N  Hermitian  matrices  is  such  a  space.  Namely,  it  is 
acted  on  by  the  general  or  special  linear  groups  according  to  the  rule 

A  -  T  A  T* 

for  A  f  P(N)  and  T  nonsingular.  It  follows  from  general  theory,  including  the  fact  that  these  lin¬ 
ear  groups  are  unimodular.  that  there  is  an  invariant  measure  on  P(N).^ 
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Returning  now  to  the  matter  of  random  correlation  matrices  we  indicate  four  possible 
methods  of  generation;  only  the  last  two  will  be  discussed  in  any  detail,  beginning  in  the  next 
section. 

Method  I:  Direct  Acceptance  —  Rejection 

Here  one  must  obtain  symbolically  the  leading  principal  minors  of  the  general  symmetric 
N  X  N  matrix  with  unit  diagonal.  This  is  possible  for  moderate  size  N  via  a  computer  algebra 
program.  Requiring  these  minors  to  be  positive  then  constitutes  a  set  of  N  -  2  nonlinear  con¬ 
straints  on  the  N(N  -  l)/2  off-diagonal  entries.  A  set  C12 . C|n,  C23 . C2n . 

of  pseudorandom  deviates  uniformly  distributed  on  (-1,1),  or  in  the  open  unit  disc,  is  generated, 
and  tested  to  see  if  the  constraints  are  satisfied.  If  the  constraints  are  satisfied,  a  correlation 
matrix  C  is  obtained;  if  not,  a  new  set  of  uniform  deviates  is  generated,  etc.  Clearly  this  method 
is  at  best  feasible  for  rather  small  values  of  N.  say  N  ^  8.  To  the  author's  knowledge,  the  distri¬ 
butional  aspects  of  the  spectral  features  of  the  resulting  matrices  are  unknown. 

Method  2:  Perturbation  about  a  Mean 

This  method  is  discussed  by  Marsaglia  and  Olkin,'®  which  is  generally  the  most  current 
source  of  information  about  our  subject.  However,  for  our  purposes,  there  is  no  reason  to  have 
in  mind  any  a  priori  mean  value. 

Method  3:  Random  Spectrum 

As  we  know,  the  spectrum  of  an  N  X  N  correlation  matrix  consists  of  N  positive  numbers 
(not  necessarily  distinct)  that  sum  to  N.  As  will  be  recalled  in  the  next  section,  every  such  set  of 
N  numbers  occurs,  so  that  the  possible  spectra  fill  out  a  simplex  in  real  N-space.  Since  it  is 
numerically  feasible  to  generate  pseudorandom  uniform  samples  from  this  simplex,  we  can.  by  a 
succession  of  suitably  chosen  orthogonal  or  unitary  transforms,  arrive  at  a  random  correlation 
matrix.  An  automated  procedure  for  doing  this  latter  task  is  commercially  available  in  the 
IMSL  subroutine  GGCOR.  Statistical  aspects  of  this  method  will  be  discussed  at  some  length  in 
Section  4. 

Method  4:  Random  Gram  Matrix 

As  is  well  known,  every  real  positive  definite  matrix  A  has  a  Cholesky  factorization 
A  =  TT* 

where  T  is  a  uniquely  defined  lower  triangular  matrix  with  positive  diagonal  entries.  Let  the  rows 
of  T  be  denoted  t, . Then 

ajj  -  <[tj,tji> 

and  so  A  can  be  considered  a  Gram  matrix  defined  by  the  vectors  |t| . t,^}  If  also  A  is  a 

correlation  matrix,  then  each  vector  tj  must  have  length  I.  Consequently,  any  procedure  for 
generating  pseudorandom  unit  vectors,  with  any  distribution,  will  result  in  a  random  correlation 
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matrix  of  Gram  type.  These  vectors  may  or  may  oot  end  in  zeros,  as  in  the  Cholesky  factoriza¬ 
tion.  but  naturally  we  do  less  work  if  they  do.  This  method  is  the  most  efficient  of  the  general 
methods  1.  3.  and  4;  some  of  its  statistical  aspects  will  be  discussed  in  Section  5  (see  also  Refer¬ 
ence  10  again). 

The  method  we  might  eventually  choose  for  a  particular  application  will  depend  on  the 
nature  of  the  application  and  just  which  aspect  of  the  random  correlation  matrices  we  wish  to 
have  an  unambiguous  uniform  distribution. 

1.3  Background  and  Motivation 

In  terms  of  the  preceding  introductory  material,  and  prior  to  the  more  technical  develop¬ 
ments  of  the  remaining  sections,  we  will  briefly  review  some  of  the  relevant  statistical  literature. 
Specifically,  we  will  comment  on  the  contents  of  four  articles.  References  11,  12,  13.  and  10, 
listed  in  chronological  order. 

Chalmers  (1975,  Reference  11)  presents  an  algorithm  which  produces  correlation  matrices 
with  a  common  spectrum.  His  motivation  is  the  study  of  strongly  structured  data,  that  is  ,  ran¬ 
dom  vectors  whose  first  two  or  'tree  principal  components  explain  much  of  the  variability  of  the 
data.  Chalmers  attempts  to  distinguish  between  causes  of  the  observed  associations  among  differ¬ 
ent  subsets  of  the  components  of  the  data,  and  whether  these  causes  are  due  to  the  physical 
nature  of  the  variables  themselves,  or  to  some  inherent  structure  in  the  data  as  captured  by  the 
underlying  principal  components.  He  uses  an  empirical  approach  to  generate  other  correlation 
matrices  with  eigenvalues  identical  to  those  observed,  and  then  compares  results  from  these 
matrices  with  those  from  the  original  data.  The  algorithm  itself  is  derived  from  a  geometric 
lemma  which  asserts  the  existence  of  an  infinite  set  of  orthogonal  generators  to  certain  quadratic 
cones  in  real  n-space.  Normalizing  these  generators  then  leads  to  the  columns  of  an  orthogonal 
matrix  which  transforms  a  given  diagonal  matrix  of  eigenvalues  into  the  desired  correlation 
matrix. 

Bendel  and  Mickey  (1978,  Reference  12)  address  the  same  problem  as  Chalmers,  but  more 
systematically,  and  with  more  concern  for  whether  the  resulting  correlation  matrices  are  truly 
‘representative’  of  the  entire  class  of  correlation  matrices  with  given  spectrum,  thought  of  as  those 
which  could  arise  from  a  given  experiment.  Bendel  and  Mickey  note  that  parameterizing  subsets 
of  r(N)  by  structure,  e.g.,  equi-interclass  correlation  (constant  off-diagonal  entries)  or  first  order 
autoregressive  (Markov- 1  data)  leads  to  very  narrow  classes  of  correlation  matrices,  unsuitable 
for  many  applications.  Their  approach  is  to  treat  the  eigenvalues  as  parameters,  especially  when 
they  in  turn  are  functions  of  one  or  two  parameters.  For  example,  the  eigenvalues  might  be 
required  to  form  a  geometric  progression.  In  general,  if  the  parameterized  eigenvalues  are  roughly 
constant,  and  therefore  approximately  equal  to  1,  the  data  variables  are  approximately  indepen¬ 
dent,  while  a  large  spread  in  the  range  of  the  eigenvalues  indicates  strong  interdependence 
between  the  variables. 


Starting  with  a  spectrum  {A|,  .  .  .  ,  Xnj}and  setting  D  =  diag  {a.],  .  .  .  .  the  method  of 
Bendel  and  Mickey  yields  a  correlation  matrix  C  of  the  form 

C  =  U*DU 

where  U  =  VR[^_i  .  .  .  R2Ri.  Here  V  is  a  randomly  chosen  orthogonal  (or  unitary)  matrix  and 
the  R’s  are  matrices  representing  Givens  rotations,  chosen  successively  to  make  one  diagonal 
entry  at  a  time  of  the  product  equal  to  1.  The  V’s  can  be  generated  by  various  procedures;  see 
the  references^'^  already  mentioned  in  Section  1.2.  Bendel  and  Mickey  go  on  to  describe  the 
application  of  their  method  to  the  problem  of  stopping  rules  in  the  statistical  procedure  known 
as  stepwise  regression.  They  also  offer  some  comparisons  between  their  method  and  that  of 
Chalmers. 

Johnson  and  Welch  (1980,  Reference  13)  also  emphasize  the  use  of  simulated  data  to  test 
alternative  selection  procedures  in  stepwise  regression,  particularly  to  build  confidence  in  the  use 
of  such  procedures  on  real  data  with  uncertain  structure.  If  the  joint  distribution  of  the  depend¬ 
ent  and  regressor  variables  is  Gaussian  then  it  is  standard  to  sample  randomly  from  it  by  factor¬ 
ing  the  covariance  matrix  and  using  a  string  of  pseudorandom  N(0,1)  variates  (c.f.  IMSL  subrou¬ 
tines  GGNSM).  Thus  only  the  structure  of  the  distribution  remains  to  be  specified,  and  this,  of 
course,  is  completely  determined  by  the  (mean  and)  covariance.  If  this  is  assumed,  as  they  do,  to 
be  of  correlation  type,  then  it  can  be  partitioned  as 

I  p* 

where  C  is  the  intercorrelation  matrix  of  the  regressors,  and  p  is  the  vector  of  correlation  coeffi¬ 
cients  between  the  regressors  and  the  dependent  variable.  So  the  emphasis  is  on  generating  such 
C's.  and  this  is  done  by  viewing  C  as  a  Gram  matrix;  C  =  TT*.  with  the  rows  of  T  being  unit 
vectors.  Johnson  and  Welch  suggest  generating  each  entry  of  T  from  a  symmetric  beta  distribu¬ 
tion.  varying  the  free  parameter  from  row  to  row.  They  note  that  a  certain  control  over  some 
aspects  ot  the  matrices  so  defined  can  be  maintained,  such  as  the  degree  of  correlation  between 
some  regressors,  and  the  coefficient  of  determination  for  the  complete  regressor  set. 

Finally.  Marsaglia  and  Olkin  (1984,  Reference  10)  give  a  rigorous  mathematical  description 
of  Methods  2  through  4  described  in  the  preceding  section.  Their  major  result  is  to  obtain  the 
explicit  form  of  the  distribution  of  the  entries  of  a  random  Gram  correlation  matrix  C  =  TT*. 
when  the  entries  of  T  are  generated  in  a  particular  fashion.  Some  of  this  work  will  be  reviewed 
later,  in  the  appropriate  context. 
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2.  TWO  PRINCIPAL  METHODS 


As  noted  in  Section  1.2,  only  the  methods  labeled  as  (3)  and  (4)  are  to  be  discussed  in  any 
detail  herein.  We  begin  this  discussion  now,  setting  the  stage  for  the  new  results  which  will  be 
presented  ater  on,  after  a  review  of  some  salient  linear  algebra  (see  Section  3). 

2.1  Random  Spectrum 

As  we  know,  the  spectrum  of  a  correlation  matrix  C  e  r(N)  has  a  spectrum  o(C)  =  (X] . A-y,  } 

consisting  of  N  positive  numbers  of  sum  N.  The  set  of  all  such  N-tuples  defines  a  simplex  S.^!, 
and  we  first  want  to  observe  that  every  point  in  S^  occurs  in  this  fashion,  that  is, 

UMC):  Cf  r(N)}  =S^^ 

This  is  a  consequence  of  a  general  characterization  of  Hermitian  matrices  due  to  several  authors. 
Namely,  if  A  is  a  Hermitian  matrix  of  order  N  with  eigenvalues  A,^...^An,  and  diagonal 
entries  d|  ^...^  df,j,  then 

d, +...+ dfc^  A, +...+ Afc  (2.1) 

for  1  -  1,  and 

d|  +...+  djyj  =  A|  +...+  Afj  =  tr(A)  (2.2) 

(Schur,  1923).  Conversely,  given  real  numbers  {dpAj  satisfying  all  these  conditions,  there  exists  a 
real  symmetric  matrix  A  with  diagonal  entries  dj,...,dN,  and  eigenvalues  Ai,...,Ajv(  [Horn  (Refer¬ 
ence  14),  Mirsky  (Reference  15)],  In  our  case,  however,  the  result  follows  more  directly  from  a 
theorem  of  Fillmore,  namely  that  any  matrix  A  is  unitarily  equivalent  to  one  with  a  constant 
diagonal.  This  in  turn  is  an  easy  consequence  of  the  convexity  of  the  numerical  range  W(A),  so 
that  tr(A)/N  e  W(A),  and  an  induction  argument. 

The  upshot  of  the  above  paragraph  is  that  given  (A|,...,An)  e  Sn  there  is  a  correlation  matrix 
C  with  these  numbers  as  its  spectrum.  How,  in  practice,  is  such  a  matrix  to  be  obtained?  As 
already  noted,  answers  have  been  given  by  Chalmers  and  Bendel-Mickey;  there  is  also  the  paper 
by  Chan  and  Li*^  which  more  generally  provides  an  algorithm  for  constructing  a  real  symmetric 
matrix  with  given  diagonal  entries  and  eigenvalues  satisfying  the  conditions  (2.1)  and  (2.2).  It 
appears  that  for  present  purposes  the  most  natural  method  of  obtaining  the  matrix  is  that  pro¬ 
posed  by  Bendei  and  Mickey,  namely, 

C=  R^.|...R2  R*  D  R|  R2...Rn.j  (2.3) 

where  D  =  diag[A|,...,Af,,],  and  R^  is  a  rotation  in  the  plane  spanned  by  the  standard  unit  vectors 
2|j  and  2|j+|.  The  matrix  R(j  has  the  form 


with  c2  +  s2  =  1 .  The  rotation  angle,  arc  cos(c),  is  chosen  so  that  the  k-th  diagonal  entry  of  C  is  1 
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We  can  strengthen  the  preceding  remark  by  replacing  the  diagonal  matrix  D  in  Equa¬ 
tion  (2.3)  by  A  =  U*DU,  U  unitary.  That  is,  A  is  an  arbitrary  positive  definite  matrix  with  spec¬ 
trum  Then  A  can  still  be  transformed  into  a  correlation  matrix  C,  as  before,  and 

there  are,  in  fact,  exactly  4  choices  for  each  R|^  in  Equation  (2.3). 

To  see  this,  consider  first  the  specification  of  R|.  R^jARj  should  have  a  diagonal  entry  =  1. 
Let  Ap  be  a  principal 


with  a,  d  >  0.  Let  Q  I 

c  s 
-s  c 

or  a  reflection 

c  s 
s  -c 

Then  the  upper  left  entry  of  Q*ApQ  is  ac^  +  2Re(b)cs  +  ds^,  respectively.  From  the  behavior  of 
the  Rayleigh  quotient  of  Ap,  we  see  that  this  quadratic  form  in  (c,s)  will  somewhere  assume  the 
value  1  if  and  only  if  Ap  has  one  eigenvalue  ^  1  and  the  other  ^  1.  Now  since  a,  d  are  diagonal 
entries  of  A,  and  tr(A)  =  N,  Ap  can  be  chosen  so  that  one  of  a,  d  is  <  I,  and  the  other  ^1.  Its 
eigenvalues  X  (  ^  X2  then  satisfy 

Ki  -  min<ApX,x>  <  min(a,d) 

^  1  ^  max(a,d)  ^  max<ApX,x>  ^  X  | 

So  the  condition  on  Ap  is  satisfied,  and  therefore  4  choices  of  Q  exist  to  yield  a  1  in  the  upper 
left  corner  of  Q*ApQ.  R|  is  then  created  immediately  as  a  direct  sum  of  Q  and  an  identity 
matrix.  The  whole  procedure  can  then  be  repeated,  since  the  sum  of  the  remaining  diagonal 
entries  of  R*i  A  Rj  is  N  -  1. 

The  preceding  remarks  are  a  slight  elaboration  of  some  made  at  the  end  of  Section  4  of 
Reference  10.  We  note  that  in  the  discussion  so  far  of  this  section  there  are  no  issues  of  random¬ 
ness.  These  can  be  introduced  in  two  stages.  First,  if  a  point  (X|,...,X)e  Sjyj  is  given,  we  can  form 
the  corresponding  diagonal  matrix  D,  select  an  orthogonal  matrix  V  at  random  from  the  orthog¬ 
onal  group  0(N)  with  normalized  Haar  measure,  and  select  a  succession  of  orthogonal  matrices, 
one  of  4  choices  at  random  at  each  step,  so  as  to  transform  V*DV  into  a  correlation  matrix  C. 
This  C  may  fairly  be  said  to  be  a  random  correlation  matrix  with  specified  spectrum.  Second,  the 
spectrum  may  itself  be  chosen  from  some  probability  distribution  of  S^.  The  resulting  matrices 
are  said  to  have  a  random  spectrum.  The  special  case  where  the  distribution  of  is  uniform 
will  be  discussed  at  some  length  in  Section  4.  Some  issues  here  are  that  this  method  is  evidently 


2X2  submatrix  of  A,  say 
a  b' 

b  d. 

je  a  2  X  2  orthogonal  matrix,  either  a  rotation 


I 
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rather  computationally  expensive,  and  the  distribution  of  the  entries  of  the  resulting  correlation 
matrices  is  not  well  understood.  However,  we  will  be  able  to  say  something  about  the  distribu¬ 
tion  of  some  global  features  of  these  matrices. 

2.2  Random  Gram  Matrix 

We  first  quickly  review  the  concept  of  Gram  matrix.  Let  X) . xj^  be  linearly  independent 

vectors  in  any  pre-Hilbert  space.  The  corresponding  Gram  matrix  is  the  N  X  N  Hermitian  matrix 
G  =  G(xi,...,xn)  with  (i j)-entry  =  <Xi,Xj>.  The  determinant  g(X[,...,XN)  is  called  the  Gramian  of 
the  set  {x|,...,Xjy(j.  Clearly  the  covariance  matrix  of  a  set  of  normalized  second  order  random 
variables  falls  under  this  definition.  In  general.  Gram  matrices  are  positive  semidefinite  as  follows 
from  the  formula 

<Ga,a>  =  II  ^  ajXill^  ^  0 

for  any  N  complex  numbers  Of  aj^.  Further,  as  already  noted  in  Section  1.2,  by  Cholesky  fac¬ 
torization,  any  real  positive  definite  matrix  is  a  Gram  matrix;  more  generally,  any  complex  posi¬ 
tive  semidefinite  matrix  has  a  positive  semidefinite  square  root,  and  so  is  a  Gram  matrix. 

The  Gramians  are  symmetric  functions  of  their  arguments,  and  obey  the  inequalities 

0^g(x . .  ||xil|2  ...  ((xnI|2  (2.4) 

with  equality  on  the  left  if  an  only  if  |x,}is  linearly  dependent,  and  equality  on  the  right  if  and 
only  if  |xj}  is  orthogonal.  To  prove  the  right  hand  inequality  we  first  reduce  to  the  case  that 
each  Xj  is  a  unit  vector,  and  then 

g(X| . XN)'/N  =  (n\i)l/N 

where  { Aj }  =  a(G). 

One  senses  from  this  that  the  Gramian  and  other  spectral  features  of  the  Gram  matrix  bear 
some  relation  to  the  relative  orientation  of  the  vectors  {Xj|.  Along  this  line  we  recall  that  if  the 
vectors  Xj  belong  to  R^,  then 

N 

vol({2]  AiXi;0<Ai=^ei})  = 

I 

N 

g(x,,...,XN)'/2  (2.5) 

I 

so  that  in  particular  g(X|,...,x;sf)  is  the  square  of  the  volume  of  the  parallelepiped  spanned  by  the 
set  I  Xj }.  If  the  Xj  belong  to  some  other  space.  Equation  (2.5)  serves  to  define  this  volume. 
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In  addition  to  the  simple  Hadamard  inequality  in  Equation  (2.4),  we  have  further 
g(x,,...,Xn,.yi,...,yN)<  g(xi,...,Xm)  g(yi.—yN) 

and  in  fact  the  ratio  of  the  left  side  to  the  right  side  is  known  to  be  sin^  aj...sin’  oyj,  where 
a  I . M  ^  N,  are  the  angles  between  spanjxj}  and  span|yjj. 

Gram  matrices  occur  naturally  in  all  manner  of  least  squares  problems,  such  as  Gram- 
Schmidt  orthogonalization,  linear  regression,  and  pseudoinversion.  Indeed,  the  basic  problem  of 
computing  the  orthogonal  projection  onto  span  |  Xj }  requires  the  solution  of  a  linear  system  with 
Gram  coefficient  matrix.  It  is  familiar  that  as  the  basis  vectors  Xj  deviate  more  from  orthogonal¬ 
ity,  such  problems  become  more  ill-conditioned,  and  associated  statistical  procedures  are  said  to 
suffer  from  ‘collinearity.’  For  example,  if  Xj(t)  is  the  monomial  t',  and  the  inner  product  comes 
from  some  Lebesque-Stieltjes  measure  with  compact  support,  then  the  corresponding  Gram 
matrices,  indexed  by  N,  have  a  condition  number  that  grows  at  least  as  fast  as  4^;  the  classical 
Hilbert  matrices  are  special  cases;  the  main  result  of  this  reference  will  be  discussed  later  in 
Section  3.  For  a  recent  review  of  the  collinearity  problem  with  suggestions  for  its  measurement 
by  more  subtle  indicators  than  simply  condition  number,  see  Reference  19. 

As  a  reference  point  for  later  use,  we  record  here  a  well-known  distance  formula  involving 
Gramians.  Let  M  =  span  {x|,...,X(<4|,  and  let  x  be  another  point  in  the  space  containing  M.  Then 

g(Xi,...,XN,x) 

dist(x,M)2  =  — ; - -  (2.6) 

g(X,,...,XN) 

Recall  that  random  Gram  matrices  were  defined  by  Method  4  in  Seaion  1.2.  In  present 
notation  the  Xj  are  taken  to  be  random  vectors  uniformly  distributed  over  the  sphere  S^*'  in  rN. 
We  now  report  some  results  from  a  small  simulation,  intended  to  compare  such  matrices  with 
those  of  random  spectrum  (Method  3).  We  give  here  only  the  cases  N  =  5  and  10,  as  they  appear 
typical  of  all  cases  considered.  In  each  case,  the  summary  statistics  are  based  on  1(XX)  trials.  In 
the  left  column  of  Table  I,  ‘c.n.’  means  condition  number,  ‘F  norm’  means  Frobenius  norm,  and 
‘norm’  means  spectral  norm.  Also,  ‘trimmed’  means  that  the  largest  I  percent  and  smallest  1  per¬ 
cent  of  the  samples  have  been  deleted. 

Probably  the  most  striking  contrast  to  be  made  on  the  basis  of  this  numerical  experiment  is 
the  excessively  high  condition  numbers  of  the  random  Gram  matrices  relative  to  those  of  the 
matrices  with  random  spectrum.  This  aspect  of  the  data  persists  even  after  trimming,  and  after 
passing  to  medians.  It  strongly  suggests  that  random  Gram  matrices  do  not  have  a  random  spec¬ 
trum.  It  also  raises  interesting  questions  about  the  relative  orientation  of  a  batch  of  2  or  more 
vectors  drawn  independently  from  the  uniform  distribution  on  the  (N  -  l)-sphere.  Some  of  these 
will  be  considered  in  Section  5  below. 
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TABLE  1 

Statistics  for  Random  N  X  N  Matrices 


z 

II 

01 

10 

N  =  5 

10 

Mean  c.n. 

111. 

553. 

1.37E6 

7.85E6 

Standard  Deviation 

754. 

1.1 8E4 

4.29E8 

2.19E8 

Median  c.n. 

15.8 

40.5 

114. 

809. 

Interquartile  Range 

30.7 

80.9 

446. 

3533. 

Trimmed  Mean 

39.2 

95.4 

1.57E3 

1.57E4 

Standard  Deviation 

70.3 

157. 

5.48E3 

6.65E4 

Mean  F.  Norm 

2.88 

4.24 

2.99 

4.36 

Mean  Norm 

2.31 

2.92 

2.43 

2.94 

Mean  Min.  E-Value 

.191 

.100 

.042 

.009 

Standard  Deviation 

.161 

.093 

,054 

.012 

Random  Spectrum 

Random  Gram 

II 


3.  ASPECTS  OF  NUMERICAL  LINEAR  ALGEBRA 


This  section  contains  a  brief  review  of  some  quantitative  aspects  of  linear  algebra  that  are 
pertinent  to  the  material  that  follows.  For  general  background  information  on  matrix  theory  we 
may  refer  to  two  recent  volumes:  Horn  and  Johnson^O  or  Lancaster  and  Tismenetsky.^i  More 
specialized  treatments  of  numerical  linear  algebra  are  given  by  Stewart22  and  Golub-van  Loan.23 


3.1  Bounds  on  Norms  and  Eigenvalues 


Given  an  N  X  N  matrix  A  we  shall  have  occasion  to  use  its  operator  or  spectral  norm 
II All  =  max{||Ax||/ ||x||:  x^d} 
and  its  Frobenius  norm 

In  terms  of  the  positive  part  P  =  (AA*)*/2  of  A,  we  have 

0<llAll  =  r^(P)<||A||F  =  tr(P)  (3.1) 

where  r^(-)  means  spectral  radius.  Bringing  in  the  eigenvalues  A-l  ^  .  .  .  ^  A-N  of  A,  and  the  singu¬ 
lar  values  S|  ^  .  .  .  ^  Sn  (these  are  the  eigenvalues  of  P),  we  have  ||  A||  =  S|,  and 

N  -  N 

2  N  <  1IA|I^=  S  (3-2) 

I  1 

with  equality  if  and  only  if  A  is  normal  (theorem  of  Schur  and  Mirsky). 

For  general  matrices  A,  the  singular  values  have  many  fascinating  properties  and  applica¬ 
tions,  such  as  min-max  characterizations,  smooth  dependence  on  A  (which  leads  into  perturba¬ 
tion  theory),  and  geometric  interpretations  as  distances  from  A  to  spaces  of  matrices  of  lower 
rank.  This  latter  property,  on  the  one  hand,  leads  into  regularization  techniques  for  least  squares 
signal  processing  and,  on  the  other,  permits  generalization  to  compact  operators  on  infinite 
dimensional  spaces  (s-number  theory). 

Let  us  now  specialize  to  the  case  of  primary  interest  here,  namely,  that  where  A  =  C  is  a 
correlation  matrix.  Then  we  have 


1^  IICII  =  r„(C)  =  min{||C||} 

^  max{||roW|||,, .  .  .  ,  HrowNlli} 


(3.3) 


where  11|  ||1  refers  to  a  general  matrix  norm  induced  by  some  vector  norm,  and  H-Hi  is  the 
2'-vector  norm.  The  expression  “max{.  .  above  is  just  the  matrix  norm  induced  by  the 
i^-vector  norm.  Its  advantage,  as  with  the  bound  IlCHp,  is  that  it  is  immediately  computable 
from  the  entries  of  C.  Either  of  the  extremes  I,  N  can  be  reached  by  some  C  e  r(N).  The  second 
equality  is  true  in  much  greater  generality;  in  fact,  it  is  true  for  any  operator  on  a  Banach 
space.  24 
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If  A  is  positive  semidefmite,  then 


lajjl  ^  —  (aii  +  ajj) 


showing  in  particular  that  all  off-diagonal  entries  of  a  correlation  matrix  have  modulus  ^  1.  Of 
course,  such  a  matrix  need  not  be  diagonally  dominant. 


An  improvement  on  the  bound  ||C|1  <  IlCljp  has  been  noted  by  Leclerc,^^  specifically  for 
correlation  matrices.  Namely, 


ICIl  <  I  + 


/N  -  1 
\  N 


1/2 


So) 


(3.4) 


where  =  sum  of  squares  of  off-diagonal  entries  of  C.  The  right-hand  inequality  here  is  strict 
unless  all  off-diagonal  entries  have  modulus  =  I.  This  bound  on  ||C||  can  be  either  larger  or 
smaller  than  the  ‘max'  bound  of  Equation  (3.3). 


At  this  point  we  have  given  some  upper  bounds  for  ||C||,  and  hence  for  ail  the  (positive) 
eigenvalues  of  C.  Upper  bounds  for  ||C‘*|i  are  equivalent  to  lower  bounds  on  the  eigenvalues  of 
C,  using  ||C*‘|(  =  r^fC*');  note  that  (C**),  while  still  positive  definite,  is  no  longer  a  correlation 
matrix  in  general.  This  kind  of  bound  is  not  of  particular  interest  to  us  here,  but  lower  bounds 
on  I1C'*11  are  important,  in  connection  with  condition  number  estimates,  and  will  be  discussed 
later  on.  Here  we  will  just  recall  an  inequality  of  Kato,^^  which  gives  abound  on  11A*’|1,  for  any 
nonsingular  A: 

llA-Ml^||A||N-i/idet(A)|  . 


There  are  innumerable  inequalities  pertaining  to  the  eigenvalues  of  positive  definite  matrices 
(and  operators).  We  mention  Just  two,  for  products  and  sums  of  eigenvalues,  due  to  K.  Fan. 

With  A  positive  definite  and  its  spectrum  ordered  as  ^  .  .  .  ^  ^n: 

N 

^N-l  •  •  •  ~  f"*"  n  ^ACj,  ei> 

i=k 

k 

A(  +  .  .  .  +  A|j  =  max  <Aei,  ej> 
i=I 

k  =  1 . N,  where  the  min  (respectively  max)  is  taken  over  all  sets  of  n  -  k  +  I  (respectively  k) 

orthonormal  vectors  (see  Reference  27,  Chapter  2,  for  these  and  many  others). 

Finally,  we  mention  the  concept  of  spread  of  a  matrix  A.  This  is  the  quantity 

S(A)  =  diam  a(A)  s  inaxlXi  -  AjI  (3.5) 
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When  A  =  C,  a  correlation  matrix,  the  following  bounds  on  S(C)  can  be  derived; 

2  max  lc,jl  <  S(C)  ^  (2(1|C1||-  N))''^ 

Since  l|C||p2  =  S  the  last  inequality  offers  a  lower  bound  on  this  quantity.  But,  in  fact,  a 
stronger  2-sided  inequality  can  be  established,  namely 

4  S(C)2  +  N  ^  lICIll^  4  N  S(C)2 
2  4 

by  working  with  eigenvalues. 

3.2  Condition  Number  Estimates 

The  condition  number  «(A)  of  an  arbitrary  matrix  A  is  defined  by 

<c(A)=  IIAII  IIA^II  =  ||A||  ||A-«(|  (3.6) 

where  the  means  pseudoinverse,  and  the  second  equality  is  naturally  only  applicable  if  A  is 
nonsingular.  Note  that  this  is  the  spectral  condition  number;  other  matrix  norms  might  be  used 
in  Equation  (3.6).  In  terms  of  the  singular  values  of  A  we  have 

1^k(A)  =  s,/sn  (3-7) 

with  equality  if  and  only  if  A  is  a  nonzero  multiple  of  a  unitary  matrix.  [Naturally,  Equation 
(3.7)  is  restricted  to  nonsingular  A.]  Many  kinds  of  singular  matrices  A  can  have  «’(A)  =  1;  for 
instance,  orthogonal  projections  and,  more  generally,  partial  isometries. 

Condition  numbers  are  widely  used  as  measures  of  sensitivity  of  the  solution  of  linear  sys¬ 
tems  to  inaccuracies  in  the  data.  Similarly,  the  condition  number  of  the  matrix  of  eigenvectors  of 
a  diagonalizable  matrix  measures  the  closeness  of  an  approximate  eigenvalue  to  the  true  spec¬ 
trum.  Roughly  speaking,  the  percentage  change  in  the  (least  squares)  solution  x  of  the  system 
Ax  =  b  is  bounded  by  the  percentage  variation  in  the  data  b  times  k(A),  and  this  bound  cannot 
be  lowered.  Thus  >c(A)  is  a  measure  of  the  inherent  resistance  of  a  particular  system  to  accurate 
solution,  and  which  does  not  depend  on  the  particular  numerical  method  employed.  The  larger 
the  condition  number,  the  more  'ill  conditioned'  a  particular  system  is,  and  the  less  we  can  infer 
a  small  error  from  a  small  residual. 

We  might  also  remark  that  k(A)  can  be  characterized  geometrically  by  the  least  angle  i// 
resulting  as  A  is  applied  to  all  possible  pairs  of  orthonormal  vectors;  precisely 

(f(A)  =  cot(i^/2) 

It  is  instinctive  to  want  to  measure  ill  conditioning  by  some  function  of  the  eigenvalues,  but 
this  is  only  fruitful  for  normal  matrices.  For  example,  there  is  the  N  X  N  'Kahan  matrix’ 
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]  -1  ...  -1 

0  1 

-1 

_0  .  .  .  1 

which  clearly  has  all  eigenvalues  =  1,  yet  a  condition  number  >  However,  when  A  is 

positive  definite  with  eigenvalues  Aj  ^  ^  Aj^,  then 

k(A)  =  A|  /  Af^  (3.8) 

and  we  have  the  inequality  of  Kato: 

dct(A)  V  N  / 

Thus,  for  correlation  matrices  C,  /f(C)det(C)  is  a  bounded  function.  Note  that  Equation  (3.8) 
defines  a  different  notion  of  ‘spread'  of  the  spectrum,  in  contrast  with  the  quantity  S(A)  of  Equa¬ 
tion  (3.5).  Also  note  that  Equations  (3.7)  and  (3.8)  together  imply  that 

■<(A*A)  =  k(AA*)  =  /f(A)2 

showing  how  the  familiar  ‘normal  equations’  of  many  least  squares  procedures  can  become  very 
ill-conditioned  (and,  eventually,  motivating  the  use  of  factorization  methods  which  deal  only  with 
A,  as  an  alternative). 

In  1955  J.  Riley28  used  the  fact  that 

ic(A  + Al)^,c(A)  (3.9) 

for  any  positive  definite  A  to  suggest  an  iterative  improvement  procedure  for  solving  an  ill- 
conditioned  linear  system  Ax  =  b.  This  was  a  forerunner  of  the  ridge  regression  and  regularization 
methods  of  statistics  and  signal  processing,  which  trade  off  some  bias  for  lowered  mean  square 
error.  The  inequality  (3.9)  was  greatly  extended  by  Marshall  and  Olkin^^  who  proved  that 

k(A  +  B)  ^  k(A) 

whenever  A  and  B  are  positive  definite  with  k(B)^  »c(A). 

We  now  turn  to  the  matter  of  lower  bounds  for  condition  numbers.  These  will  be  of  greatest 
interest  for  the  case  of  Gram  matrices,  but  we  consider  first,  briefly,  the  general  case.  (We  note, 
too,  the  considerable  interest  in  recent  years  in  numerical  estimates  -  not  bounds  -  for  condition 
numbers,  by  estimating  some  norm  of  the  inverse  matrix.^®'7f72 

First,  if  A  is  any  nonsingular  matrix,  with  eigenvalues  ordered  by  modulus:  |A||  ^  .  .  .  ^(An|, 

then 

1A|/An1^»c(A)  (3.10) 
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This  follows  from  the  relations 


||A-‘||->  =  inf{||Ax|l:  llx||  =  1} 

<l|Ae||  =|A| 

where  e  is  any  unit  eigenvector  associated  with  A«7(A).  Of  course,  as  the  earlier  example  of  the 
Kahan  matrix  illustrates,  the  left  side  of  Equation  (3.10)  may  severely  underestimate  the  true 
condition  number  «(A),  when  A  is  not  normal. 


Now  assume  that  A  is  positive  definite.  A  variant  of  the  well-known  Kantorovich  inequal- 
ity^^  tells  us  that 

(m,  +  mi)^ 

1|x1|2^<Ax,x><A-‘x,x>^  -  ||x||2  (3.11) 

4m  I  m2 

provided  that 

m]  I  ^  A  ^  m2  I 

for  0  <  mj^  m2.  Taking  m]  (respectively  m2)  to  be  the  least  (respectively  greatest)  eigenvalue  of 
A,  and  x  any  unit  vector,  we  obtain 

4<Ax,x><A**x,xX  <  —  +  2  <  AC  +  3 

AC 


yielding  a  lower  bound  for  k  =  ac(A)  for  each  x.  Of  course,  an  estimate  involving  A'*  is  not  of 
great  practical  value. 

Another  kind  of  inequality  comes  from  the  theory  of  Schur  (or  Hadamard)  products  of 
matrices.  We  won’t  review  this  concept  in  any  detail  here;  see  Reference  34  for  a  nice  survey. 

This  product,  for  conformable  matrices  A,  B,  is  defined  by 

[A  •  B]jj  -  ajj  ’bij 

This  multiplication,  unlike  the  usual  one,  is  commutative.  The  original  result  of  Schur  is  that 
if  A,  B  are  positive  semidefinite,  then  so  is  A  •  B.  An  inequality  of  Fiedler  (1961)  for  positive 
definite  A  reads 

A-A-‘^I  (3.12) 

Note  that,  as  a  consequence  of  either  this  or  the  left  side  of  the  Kantorovich  inequality,  when  C 
is  a  correlation  matrix, 

[C-%^1,  i=I . N 

In  1982  M.  Marcus  proved  the  matrix  norm  inequality 

||A-  B||  ^  ||A||  IIBII 

for  the  Schur  product.  Taking  B  =  A*’  yields  a  lower  bound  for  the  condition  number: 
llA-  A->|1<ac(A) 
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In  preparation  for  part  of  our  discussion  in  Section  S,  we  want  to  consider  specifically  the 
case  where  A  is  a  Gram  matrix: 

A  =  G  =  G(X|,  .  .  .  ,  Xf,j) 

in  the  notation  of  Section  2.2,  with  each  Xj  a  unit  vector  in  some  inner  product  space  X.  As 
already  remarked  in  Section  2.2  it  has  been  empirically  noted  that  many  common  Gram  matrices 
tend  to  be  ill-conditioned,  and  an  inequality  derived  in  Reference  18  can  be  used  to  quantify 
these  observations  by  providing  a  lower  bound  for  k(G)  in  terms  of  the  relative  orientations  of 
the  vectors  |xj}.  By  virtue  of  our  own  numerical  experiments  reported  earlier,  ill  conditioning  is 
a  prominent  feature  in  random  Gram  matrices  also.  We  will  now  discuss  an  improved  version  of 
this  inequality,  and  its  sharpness.  These  results  are  purely  deterministic;  statistical  implications  are 
deferred  to  Section  5. 


A  fixed  Gram  matrix  G  =  G(X| . Xj^j),  ||Xj||  =  J.  i  =  1 . N  will  be  used.  G  is  a  corre¬ 

lation  matrix,  and  ||G||  =  r^fG),  so  the  problem  is  to  find  a  lower  bound  on  )|G”*1I  in  terms  of 
the  vectors  {Xj}.  Let  M  (respectively  Mj)  be  the  subspace  of  X  spanned  by  {XjJ|N  (respectively 
{xj-.j  ^  i}|^).  Let  {vj}  be  the  basis  for  M  that  is  dual  to  {xj);.  Also,  for  an  arbitrary  real  or  com¬ 
plex  unit  vector  e  (X  is  real  or  complex),  let  b  =  G'^e.  Then 

||G-'||  ^  <G->  e,e>  =  <b,Gb> 

=  ll5;biXj||2=||v||2  . 

Now,  with  V  as  just  defined,  it  is  easily  checked  that 


b  = 


<Xp  v> 


so  that  if  V  =  Vj,  one  of  the  dual  basis  vectors  in  M,  then  Gb  =  ej,  the  standard  unit  basis  vector. 
We  also  observe  that,  since  M;  is  of  codimension  I  in  M,  the  duality  formula  for  distance, 

dist(x,Mi)  =  max(|i/r(x)|;  (3.13) 

for  xeM,  implies  that 


dj  =  dist(Xi,Mi)  =  <Xi,  Vj/  ||Vil|> 

=  l/l|Vill  . 

Putting  all  this  together,  we  conclude  that 

IIG-'ll  ^<G->ei,  ei>=  ||Vil|2 

,  8i(Xi . Xn) 

=  1/d?  = - 

'  g(X|, - Xn) 


(3.14) 
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where  the  last  equality  follows  from  the  Gramian  distance  formula  of  Equation  (2.6),  and  'g;' 
means  the  Gramian  with  Xj  deleted. 

The  ensuing  inequality 

||G-'!I  ^  max  dj^r — ! —  (3.15) 

min  d^ 

is  due  to  J.  Taylor  (Reference  18,  p.  46).  The  major  difference  between  Taylor’s  approach  and 
the  one  we  are  using  is  that  the  duality  formula  Equation  (3.13)  strengthens  the  inequality  by 
avoiding  reliance  on  the  Schwarz  inequality.  Thus  the  sole  source  of  inequality  in  Equation  (3.15) 
is  the  inequality  appearing  in  Equation  (3.14).  This  inequality  is  only  a  measure  of  the  behavior 
of  the  Rayleigh  quotient  for  G‘*,  and  does  not  explicitly  involve  the  Gram  structure  of  G.  Hence 
the  following  theorem  and  proof  gives  a  measure  of  the  tightness  of  Taylor's  inequality  Equation 
(3.15). 


Theorem 

sup 

II  A-' 11 

=  N. 

A  t  r(N) 

max  <A*'ei,ei> 

Proof  For  notational  ease,  we  will  replace  A'*  by  A,  and  then  max  {<Aej.  ei>:  i  =  I.  .  .  .  ,  Nj 
by  m(A).  We  first  note  that  if  A  is  any  N  X  N  positive  definite  matrix, 

1<||A||/m(A)<N 

and  that  these  bounds  are  sharp  (within  this  larger  class  of  matrices).  The  left  inequality  is  trivial, 
and  is  achieved  for  diagonal  matrices.  The  right  inequality  follows  from 

IIAII  =  r„(A)  =  \,^tr(A)^NM(A)  . 

To  verify  its  sharpness,  let  e  >  0,  and  D  =  diag(l,e . e),  and  apply  the  theory  in  Section  2.1  to 

obtain  A,  unitarily  equivalent  to  D,  with  constant  diagonal.  Then 

1  =  II  All  <  tr(A)  =  N|i(A)  =  1  +  (N  -  l)e 

now  let  «  J  0.  So  the  point  of  the  theorem  is  that  if  the  A’s  are  restricted  to  the  class 
I  A;  A''  er(N)},  the  upper  bound  on  l|Ai|//a(A)  does  not  decrease. 

To  complete  the  proof,  consider  a  special  family  of  A’s,  namely,  {A;  A  =  al^  +  B,  bjj  = 

( 1  -  Sjjjb},  where  a  >  b  >  0.  We  have 

||A||/M(A)  =  (a  +  (N-  l)b)/a 

=  l+(N-l)b/a  (3.16) 

and  it  will  be  shown  that  a  and  b  can  be  chosen  so  that  A''  cFfN)  and 
b 

lim  —  =  i.  (3.17) 

atoo  a 
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Let  A  =  det(A).  We  have 

A  =  (a-b)N-i(a  +  (N  -  l)b) 
and  since  the  diagonal  entries  of  A'*  are 
<A‘*ej,ei>  =  i‘**-cofactor/  A 

it  follows  that 

^  {a  +  (N-2)bKa-b)N-2 

<A-'ei,ei  >  =  - 

(a  +  (N-  l)b)  (a-b)N-i 

So,  if  A'‘  is  to  be  a  correlation  matrix,  a  and  b  must  satisfy  the  equation 
a  +  (N  -  2)b  =  (a  -  b)  (a  +  (N  -  l)b). 

If  we  treat  this  as  a  (quadratic)  equation  for  b  and  solve  it,  we  obtain 

(a  -  1)  (N  -  2)  -t-  ((N  -  2)2(a  -  1)2  -t-  4(N  -  1)  (a^  -  a))‘/2 

2(N  -  1) 

After  dividing  both  sides  by  a.  and  manipulating,  we  have 

^  _  _1_  N  -  2  \/(N2  e)  -  N 

a‘’a2N-2"'  2N-2 


where 


e  = 


(N  -  2)2 
a 


(-  -2)-  -(N-1)<0 
a  a 


This  shows  that  b<  a  and  that  the  limit  in  Equation  (3.17)  is  1.  as  required.  QED 

At  this  point  we  might  justify  an  assertion  made  just  after  Equation  (3.3),  namely,  that 
sup  II All  =  N 
Atr(N) _ 

We  know  from  that  equation  that  this  supremum  is  at  most  N.  That  it  is  not  less  than  N  follows 
from  consideration  of  the  same  family  of  matrices  just  used,  and  the  value  of  the  norms  of  such 
matrices  given  in  Equation  (3.16);  just  take  a  =  I  and  let  btl. 

To  sum  up;  for  a  Gram  matrix  G  =  G(X| . X|y).  we  have  the  lower  bound  on  »f(G),  due  to 

Taylor; 

»c(G)  ^  I  /  min  d,^ 
i 


an  equivalent  form 

gi(X| . xn) 

»c(G)  ^  max - 

i  g(x,,...,XN) 
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and  an  upper  bound  on  the  tightness  of  this  lower  bound: 

i« — ^52 — . 

lower  bound 


(3.18) 


It  is  possible  that  this  upper  bound  could  be  decreased,  but  we  haven't  investigated  this  point. 
Some  evidence  is  given  below  in  Section  5.4. 

And  so,  our  review  and  development  of  numerical  linear  algebra  now  complete,  we  return  to 
discuss  our  primary  topic  where  the  matrices  have  a  random  nature. 


4.  CORRELATION  MATRICES  WITH  RANDOM  SPECTRUM 

Some  background  for  this  section  was  given  in  Section  2.1.  There,  two  essential  steps  in 
defining  correlation  matrices  with  random  spectrum  were  recognized: 

—  pick  a  point  X  =  (Xi,...,Xj^)  ‘at  random’  from  the  simplex  Sjsj,  and  form  the 
diagonal  matrix  D  =  diag(X|,...,Af() 

—  construct  a  matrix  C  =  R*V*DVR,  where  V  is  drawn  at  random  from  the  orthog¬ 
onal  group  0(N),  and  R  is  a  product  of  randomly  selected  rotation/ reflection 
matrices,  chosen  to  successively  put  I’s  on  the  diagonal  of  C. 

Naturally  the  second  step  leaves  a(C)  =  and  hence  leaves  all  spectral  functions  of  C 
unchanged.  Among  such  functions  are  the  spectral  and  Frobenius  norms  of  C,  and  its  condition 
number.  (In  general,  any  unitarily  invariant  norm  of  a  Hermitian  matrix  is  a  spectral  function  of 
that  matrix.)  Consequently,  once  a  probability  distribution  is  selected  on  S^,  so  as  to  define  the 
‘at  random’  condition  of  the  first  step,  the  spectral  functions  of  any  correlation  matrix  defined  by 
the  second  step  may  be  directly  studied.  Note  that  this  approach  does  not  apply  to  other 
numerical  attributes  of  C  of  possible  interest,  such  as  its  individual  entries;  their  distribution 
naturally  depends  in  part  on  those  of  the  V  and  R  matrices. 

4.1  Method  of  Generation 

From  the  preceding  discussion  we  see  that  a  probability  distribution  /a  must  be  specified  on 
the  simplex  S^.  Then  a  sample  K  drawn  from  /a  will  be  a  random  spectrum.  Having  no  reason  to 
weight  any  region  of  more  than  another,  we  will  take  ^  to  be  the  uniform  distribution  on  Sj,j, 
and  denote  by 


A~U(Sn) 

a  point  \  so  chosen.  Two  tasks  remain: 

—  specify  n  analytically 

—  specify  ja  operationally 

This  latter  task  simply  means  to  prescribe  a  method  for  a  computer  to  make  these  draws  in 
terms  of  an  assumed  capability  to  generate  pseudorandom  numbers  ~  U[0,1]. 

The  analytical  specification  of  /a  depends  on  (a  special  case  of)  the  general  theory  of  order 
statistics  and  spacings.  Here  we  only  need  the  case  of  independent  samples  from  the  uniform 

distribution.  Thus  let  U(|)  ^ . ^  *^he  order  statistics  of  a  random  sample  from  U([0,1]) 

Define  U(0)  =  0,  and  U(N)  =  I.  As  shown  by  Wilks,^^  the  joint  distribution  of  these  order  statistics 
is  uniform  over  the  simplex  {0  <Xi^  ...  ^x^.j^l}  in  R*^''.  Then  the  spacings  of  the  sample  are 
defined  by 


Si  =  u^i)  -  U(i.,),  1  ^  i  <  N 
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Observe  that  for  each  sample,  the  spacings  are  positive  numbers  that  sum  to  1.  It  was  also 
shown  by  Wilks  that  the  spacings  vector  (si,...,sj,4.i)  is  uniformly  distributed  over  the  simplex 

N-l 

I 

in  RN-!.  Now,  for  fixed  N,  the  mapping 

T(X|  ,...,Xiv.|)  =  (X|  ,...,Xj^.|,l  -  X|  -  ...  -  Xj^.|) 

carries  this  simplex  bijectively  onto  the  simplex  {0  ^  Xj,  Xj  =  1 }  in  and  carries  over  the 
uniform  density  (by  an  elementary  change  of  variables). 

The  upshot  of  the  preceding  paragraph  is  that,  for  fixed  N,  the  uniform  density  n  on  the 
simplex  S^m  can  be  specified  analytically  as  the  distribution  of  the  random  vector 

NT(s,,...,Sn.,).  (4.1) 

And,  of  course,  it  follows  that  /i  can  be  specified  operationally  in  terms  of  pseudorandom 
number  generator,  and  a  sorting  routine. 

There  is  by  now  a  fairly  extensive  literature  of  spacings  (sometimes  known  as  ‘gaps’,  ‘cover¬ 
ages’,  or  ‘random  division  of  an  interval*).  This  topic  can  be  traced  back  to  the  turn  of  the  cen¬ 
tury  to  the  work  of  W.  Whitworth  on  the  distribution  of  the  largest  spacing.  His  result  was  util¬ 
ized  by  Fisher  (1929)  to  give  a  significance  test  for  the  largest  amplitude  in  a  numerical  harmonic 
analysis  of  a  time  series.  (In  fact,  Fisher’s  test  turns  out  to  be  the  uniformly  most  powerful  sym¬ 
metric  invariant  decision  procedure  against  simple  periodicities.  More  recent  work  is  concerned 
with  compound  periodicities,  and  hence  with  the  distribution  of  other  functions  of  the  spacings 
besides  the  maximum.  Interested  readers  should  consult  the  papers  of  A.  Siegel,  for  example. 
Reference  36.  We  will  not  detail  any  of  his  work  here,  we  merely  wanted  to  draw  attention  to 
the  unexpected  link  between  the  spacings  concept  and  time  series  analysis.) 

In  the  later  1930’s  and  then  the  1940’s  several  authors  including  P.  Levy,  M.  Greenwood, 
and  P.  Moran  worked  on  distributions  of  functions  of  spacings.  Most  of  this  work  originated  as 
specific  problems  in  applied  statistics.  The  best  review  of  ail  this  is  that  of  R.  Pyke,^^  although  at 
a  fairly  high  technical  level.  Other  useful  works  include  References  38  and  39.  These  references 
point  out,  among  many  other  things,  alternative  analytical  specifications  of  spacings.  For 
instance,  if  y|,...,ypi  are  independently  exponentially  distributed  with  arbitrary  mean,  and 
z  =  y|+  ...  +  yfj,  then  the  random  vector 

z"'  (yi,-,yN) 

is  distributed  as  the  spacings  vector  T(S|, ...,  S]y).|).  Hence  random  spectra  can  also  be  generated 
by  use  of  exponential  variates.  From  this  it  follows  that  spacings  can  also  be  simulated  from  the 
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(normalized)  interarrival  times  of  a  Poisson  process  {N(t):  t  ^  0}  with  N(0)  ~  0.  Namely,  if  is 
the  elapsed  time  between  the  (k-l)-st  and  the  k-th  event,  and  t  >  0  is  fixed,  then  the  conditional 
distribution,  given  N(t)  =  n,  of 

r'(Ti,...,T„.,,  t  - 

is  the  same  as  the  spacings  vector.  This  is  a  classic  construction  of  spacings  with  important  mod¬ 
em  applications  to  the  limiting  behavior  of  empirical  processes.^^ 

The  distribution  of  spacings  and  some  functions  thereof  is  also  briefly  discussed  in  Kendall 
and  Moran.'^  Naturally,  geometric  aspects  are  stressed.  For  instance,  the  joint  distribution  of  the 
spacings  is,  with  proper  scale  factor,  exactly  that  of  the  lengths  of  the  N  perpendiculars  from  a 
random  point  inside  the  simplex  to  the  N  sides.  The  authors  go  on  to  discuss  some  situations 
where  probabilities  can  be  computed  from  simplicial  geometry. 


4.2  Distribution  of  Eigenvalues 


Pursuant  to  the  foregoing  discussion  we  take  as  a  random  spectrum  A  «  Sn,  N  times  the 
random  vector  of  spacings  defined  by  a  random  sample  of  size  N  -  I  from  the  standard  uniform 
distribution.  We  denote  this  vector  as  A  =  (A(,...,Aj4).  In  this  short  section  we  discuss  the  distribu¬ 
tion  of  the  Aj,  while  in  the  next  two  sections  we  consider  that  of  certain  functions  of  the  Aj 
related  to  correlation  matrices  C  with  o(C)  ~  A. 


We  first  note  that  the  A,  are  exchangable  random  variables  since,  because  of  the  uniform  dis¬ 
tribution  of  A  on  Sn,  that  distribution  is  unchanged  under  permutation  of  its  components.  It  fol¬ 
lows  that  the  A^  are  identically  distributed  and,  using  the  formula  for  the  least  order  statistic,  the 
distribution  function  F]vj  is  given  by 


FN(t)=l-(l-^) 


(4.2) 


Thus,  for  large  N,  A;  is  approximately  exponentially  distributed  with  mean  1.  From  Equation  (4.2) 
we  can  conclude  that 


E(Ai)  =  1,  var(Ai)  = 


N  -  1 
N+  1 


for  each  i. 
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Expressions  for  the  joint  distribution  of  the  Xj  have  been  given  by  Steutel.^’  Namely, 


Pr(Ai  >a,,...,XN>aN)  = 


,  if  not 


and 


N 


N-1 


Pr(X|  ^  ai,...,XN  ^  “n)  "  *  “  S  ~  N  ^ 

j  =  1 


N  aj  +  “k 

•  2  f5-> 

J,k=  I 


N-l 


-  + 


These  formulae  are  derived  by  Laplace  transform  techniques  and  the  relation,  already  alluded  to 
in  Section  4.1,  between  the  spacings  distribution  and  that  of  certain  exponential  variates. 

In  similar  fashion  one  could  go  on  to  describe  the  joint  distribution  of  pairs  (Ai,Xj),  the 
associated  covariance,  etc.  Here  we  will  just  note  that 


corr(Xi,\j)  =  Yn 


But  actually  all  such  formulae  of  likely  interest  follow  directly  from  the  multiple  moments 
formula 


E(X,  ...  Xn)  =  NPr(N) 


r(pi  1)...  r(p]>f-*- 1) 

r(p  +  N) 


(4.3) 


where  p  =  p|+...+  pj^j.  This  expression  can  either  be  derived  by  the  Laplace  transform  method  of 
Steutel,39  or,  somewhat  more  directly  and  geometrically,  as  in  Kendall  and  Moran  (Reference  40, 
page  34). 


4.3  Distribution  of  Norms 

We  continue  with  the  assumption  that  we  are  dealing  with  a  correlation  matrix  C  whose 
spectrum  X  has  been  chosen  randomly  according  to  U(S]^).  The  issue  now  is  the  distribution  of 
the  norms  ||C||  and  ||C||p,  as  defined  in  Section  3.1. 

Let  us  begin  with  1 1 C 1 1  f2  =  which,  for  both  typographical  and  historical  reasons,  we 
will  denote  by  GM(N).  In  the  statistical  literature  this  quantity  is  known  as  the  Greenwood- 
Moran  statistic,  after  the  authors  of  References  41  and  42.  It  was  originally  proposed  as  a  test 
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for  uniformity  in  response  to  a  problem  in  epidemiology  (time  intervals  between  outbreaks  of  an 
infectious  disease).  Moran^2  derived  a  general  formula  for  the  moments  of  GM(N);  it  was  re¬ 
derived  by  Steutel.^^  For  us,  it  is  enough  to  use  the  moments  formula  (4.3)  to  obtain 


2N2 

E(GM(N))  =  -— 

N  +  1 

,  4N^  (N  +  5) 

E(GM(N))  -  (N  +  1)  (N  +  2)  (N  +  3) 

and  hence 


var(GM(N))  = 


4N^  (N  -  1) 

(N  +  1)2  (N  +  2)  (N  +  3) 


=  0(N) 


If  now  for  fixed  N  we  take  a  large  sample  of  size  n  of  random  correlation  matrices  C  with 
spectra  k  ~  U(S[,j),  we  would  expect  |1C|  jp  to  be  approximately  normally  distributed  with  mean 


2N2 
N+  1 


) 


and  variance  about  l/2n.  This  is  a  consequence  of  general  theory  concerning  asymptotic 
distribution  of  continuous  functions  of  sample  means  (Reference  3S,  page  259).  In  the  particular 
cases  of  N  =  5  and  10,  we  expect,  in  a  large  sample,  to  have  ave(l  |C1  Ip)  about  equal  to  2.89  and 
4.26,  respectively.  The  reader  may  look  back  to  the  first  two  columns  of  Table  1  for  the  actual 
result  of  a  sample  of  size  n  =  1000. 

A  second  point  to  be  made  about  GM(N)  is  that  it  is  (slowly!)  asymptotically  normal,  a 
result  due  originally  to  Moran,‘*2  and  reproved  by  a  more  general  method  by  Darling,^^  see  also 
References  37  and  39.  As  usual,  Pyke  has  the  most  complete  but  also  most  opaque  discussion  of 
this  topic.  Once  this  asymptotic  normality  is  established,  the  corresponding  property  of 
\/GM(N)  =  1 1 C 1 1  p  can  be  worked  out  by  general  theory  concerning  smooth  functions  of 
asymptotically  normal  variates.  In  fact,  since  GM(N)/2N  has  mean  N/(N  +  1)  **  1,  and  variance 
=  0(  1  /  N),  its  asymptotic  normality  implies  that  \/GM(N)/2N  is  asymptotically  normal  with 
mean  1,  and  variance  a^/4.  Hence  \/GM(N)  is  asymptotically  normal  with  mean  \/2N,  and 
variance  ( 1  /  2)Na^  =*1/2. 

Next  we  consider  the  spectral  norm  1|C|1,  for  an  N  X  N  correlation  matrix  C  with  random 
spectrum  as  usual.  Since  I1C1|  =  A^niax>  maximum  eigenvalue  of  C,  the  distribution  of  (|C|(  is 
that  of  N  times  the  maximum  spacing  determined  by  a  random  sample  of  N  -  1  points  from  the 
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standard  uniform  distribution.  We  let  denote  this  maximum  spacing,  so  that  =  1 1 C 1 1  «« 
NVn. 

As  already  noted  in  Section  4.1,  the  distribution  of  goes  back  to  Whitworth  (1897),  and 
has  a  history  of  interesting  statistical  applications.  A  convenient  source  for  this  distribution  is 
Reference  43,  wherein  one  can  also  find  a  derivation  of  the  asymptotic  behavior,  due  originally 
to  Levy  (1939).  We  find  that 


Pr(NVN<x)=|  (-l)Mi;')(l-^)*'‘ 

k  =  0  ^ 


where  (t)^  =  max(t,0).  From  this  one  could  derive  the  mean  and  higher  moments,  as  needed.  As  a 
somewhat  neater  alternative,  we  can  appeal  to  some  well-known  relations  between  the 
distribution  of  the  spacings  and  certain  exponential  variates,  as  briefly  reviewed  in  Section  4. 1 . 
Now  making  use  of  the  fact  that  the  sum  of  exponential  variates  yj  is  gamma-distributed,  and  the 
known  distribution  of  the  order  statistics  from  the  exponential  distribution,  we  can  obtain 

NVf4  ~  N  max{yi}/z. 

Also,  a  formula  was  given  by  Devroye.^ 

N 

yi/o/z 

I 

In  both  cases  z  =  yj  +...+  yjy.  From  all  this  we  can  deduce  that 
E(X„„)  =  E(NVN)=I+y*...*^ 


log  N  +  7 


where  y  =  .577  ...  is  Euler’s  constant. 

Finally,  the  Levy-Darling  asymptotic  formula  for  the  maximum  spacing,  scaled  to  apply  to 
the  maximum  eigenvalue  of  the  matric  C  is 

Pr(NVj»j  <  log  N  +  x)  —  exp(~  e"  *) 

as  N-^.  From  this,  it  follows  that 

vaifNVf^)  —  n-2/6 


as  N—"^. 
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Thus  we  have  obtained  the  exact  means  of  the  norm  functions  I  |C|  |p  and  I  |C|  |,  and  the 
asymptotic  behavior  of  these,  along  with  llC||p,  as  N—**.  In  particular,  we  have  observed  that 
the  Frobenius  norm  tends  to  normality,  while  the  spectral  norm  tends  to  obey  an  extreme  value 
distribution. 


4.4  Condition  Number  Expectation 

We  continue  to  study  an  N  X  N  correlation  matrix  C  with  random  spectrum.  Now  our  focus 
is  on  the  distribution  of  the  condition  number  x(C),  as  defined  by  Equation  (3.6).  As  we  know 
from  Equation  (3.8),  ic(C)  =  the  ratio  of  the  largest  to  the  smallest  eigenvalue  of  C. 

We  have  just  described  the  distribution  of  X^ax  "  IIC|1.  In  fact  the  joint  distribution  of  (X^^^, 
^min)  inferred  from  the  work  of  Darling,^^  in  the  following  form: 

^  ^max  V) 


N-1 


+ 


From  this,  by  letting  y  t  N,  we  obtain  the  distribution  for  the  least  eigenvalue: 


Pr(X„i„>x)  =  (l-x)N-«  0<x<l 


(4.4) 


This  formula  yields  the  moments  of  Xn,i„  as 


E(Ami„)=I/N 


var(X„i„)  =  (N-l)/N2(N+!)  . 

We  might  pause  here  to  collect  together  the  formulas  giving  the  expected  behavior  of  the 
eigenvalues,  and  their  important  functions,  as  a  function  of  N,  for  N  X  N  correlation  matrices 
with  random  spectrum.  Namely,  we  have  seen  that 

E(Xi)  =1  1  ^  Xj  ^  N 

E(^max)  *=  log  N  +  7 

E(^tnin)=  1/N  and 

E(2;xf)«2N 
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Returning  now  to  the  joint  distribution  of  ^  inferred  from 

Reference  43  that  these  quantities  are  asymptotically  independent,  as  a  consequence  of  the 
formula 

Ptf^min  ^  >  ^max  ^ 

-  exp  (-X  -  y), 

as  N-«*.  This  permits  us  to  write,  for  large  N, 

E(k(C))  =  E(\^„/ 

*E(X„3,)E(‘/X„i„). 

However,  although  the  first  factor  is  finite,  as  we  know,  the  second  is  not: 


(I  -(1  -x)N-»)  dx 


(l-x)N-> 


dx 


...)  dx  =  +  oo. 


This  observation  suggests  that  the  condition  number  k(C)  may  not  have  a  finite  first 
moment.  Additional  grounds  for  such  suspicion  can  be  based  on  its  validity  at  the  other 
extreme  case  where  N  =  2.  In  this  simple  case  the  assertion  goes  as  follows:  if  a  single 
number  s  is  drawn  at  random  from  the  interval  [0,  1],  and  U  (respectively  V)  is  the  min 
(respectively  max)  of  {s,  1-s},  then  the  radio  V/U  obeys  the  distribution 


Pr(V/U^t)  =  |^, 


and  so  has  infinite  expectation.  This  formula  is  derived  by  Feller  (Reference  45,  page  24).  We 
now  generalize  this  fact  to  the  case  of  arbitrary  N. 

Theorem  Let  C  be  a  correlation  matrix  with  random  spectrum.  Then  the  condition  number 
k(C)  has  infinite  expectation. 
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Proof.  In  the  notation  of  Section  4.1  we  let  0  <  u^ij  <  U(2)  <  ...  <  U(n.i)  <  1  be  the  order 
statistics  of  a  random  sample  of  size  N  -  1  from  the  standard  uniform  distribution.  The  joint 
distribution  P  of  these  statistics  is  the  ordered  (N  -  l)-variate  Dirichlet  distribution  (Reference  35, 
Section  8.7),  and  is  uniform  over  the  region 


fl  =  {x;  0  <  X(  <  X2  <  ...  <  x^.i  <  I) 


in  RN.  Therefore, 


E(K(C))  =  f...f  - r- - ^ - dP 


min{ . } 

max{ ...  } 


jJ  — HIT- ““<1)  ■  ““(N-.) 

where  T  is  the  subregion  of  fl  defined  by 


x,  ^  min{x2  -  Xj,  X3  -  X2,...,  1  -  xn-|}, 


and  we  have  used  that  the  maximum  spacing^  1/N.  Now,  the  last  multiple  integral  over  T 
exceeds,  for  sufficiently  small  e  >  0,  the  integrated  integral 


(N  -  2)xi  1  -  (N  -  3)x,  1  -  X, 

dx2  j  dxj  ...  j  dxN., 

X,  +  X2  X,  +  Xn.2 

r  l/(N-_)!lx,q(j^) 

J  X, 

0 

where  q  is  a  polynomial.  This  last  integral  is  clearly  divergent.  QED 

This  completes  our  discussion  of  correlation  matrices  with  random  spectrum. 
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5.  CORRELATION  MATRICES  WITH  RANDOM  GRAM  STRUCTURE 

In  this  final  section  we  discuss  random  Gram  matrices,  as  defined  in  Section  1.2,  and  briefly 
discussed  in  Section  2.2,  along  with  some  simulation  results.  We  will  follow  the  same  plan  as  in 
the  preceding  section,  namely,  generating  such  matrices  and  distribution  of  certain  related  ran¬ 
dom  variables.  Finally  we  will  make  a  few  comparisons  between  the  sample  behavior  of  the  two 
types  of  random  matrices. 

5.1  Method  of  Generation 

We  recall  from  Section  1.2  that  an  N  x  N  random  Gram  matrix  C  has  the  form 

C  =  TT*  (5.1) 

where  the  rows  of  T  are  i.i.d.  vectors  distributed  uniformly  on  the  sphere  S’^"’  in  R^.  That  is, 
for  each  row  ij  of  T,  we  have 

tj  ~  U(SN->) 

So,  just  as  in  Section  4.1,  the  first  question  is  how  to  express  such  random  vectors  in  terms  of 
standard  univariate  random  variables. 

This  is  a  well  researched  problem,  with  contributions  dating  back  at  least  30  years.  A  short 
paper  by  Marsaglia^  has  a  review  of  these  early  contributions,  along  with  an  improved  method. 
More  recent  references  are  the  pragmatic  paper  by  Rubinstein,^^  and  the  extensive  book  of 
Devroye.^*  Again  we  distinguish  between  the  analytic  and  the  operational  specification  of 
U(SN-i).  The  basic  analytical  result  is  that  if  X  is  a  continuous  radially  symmetric  N-dimensional 
random  vector,  then  its  projection  on  the  sphere  is  uniformly  distributed,  that  is, 

X/IIXl)  ~  U(SN-l)  . 

In  particular,  we  can  take  X  ~  N(0,I),  the  standard  spherical  multivariate  normal  distribution. 
Operationally,  the  components  of  X  can  be  generated  by  any  of  several  standard  pseudorandom 
normal  variate  routines.  These  eventually  utilize  pseudorandom  uniform  variates.  The  latter  can 
also  be  used  more  directly  to  generate  pseudorandom  U(S^'*)  vectors,  as  is  pointed  out  in  (Ref¬ 
erence  46  or  in  Reference  48,  Chapter  V).  These  are  in  addition  to  the  brute  force  acceptance 
rejection  method,  which  tends  to  be  very  inefficient  for  large  N  (N  ^  5,  say).  However,  we  will 
stick  with  the  projected  normally  distributed  random  vectors. 

Suppose  now  that  we  have  a  random  vector  X  ~  U(SN*').  It  will  be  important  to  know  how 
the  components  of  X  are  distributed.  It  turns  out  that  each 

1  N  -  1 

xf~Beta(Y, — —)  (5.2) 

and  that  the  density  function  of  Xj  is 

Cn(1  ,  |t|<l  (5.3) 
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where  =  r((N/2)/(r(l/2)r((N-l)/2))  is  a  normalizing  constant.  It  is  interesting  to  observe  that 
these  distributions  vary  considerably  with  dimension.  In  particular,  we  see  that  follows  an  arc 
sine  distribution  when  (xi,X2)  ~  U(S‘),  while  xj  is  uniform  on  (-1,1)  when  (X|,X2,X3)  ~  U(S2).  As 
N  increases  beyond  3,  the  density  is  unimodal  with  an  ever  steeper  peak  at  t  =  0. 

We  might  note  here  that  the  joint  density  of  two  or  more  of  the  Xj  is  also  available,  as  a 
consequence  of  some  work  of  Stam.^^ 

As  a  consequence  of  these  facts  we  have  the  following  geometrical  lemmas:  if  X, 
pendently  and  uniformly  distributed  on  S^-*,  then 

E«X,Y»  =  0 

E«X,Y>2)-  1/N 

and 

var«X,Y>2)  =  ~  (5.4c) 

N2(N  +  2) 

Indeed,  Equation  (5.4a)  is  a  consequence  of  the  so-called  “formula  of  total  expectation,” 

E(f(X,Y))  =  E  (E(f(X,Y)|X)) 

for  scalar  functions  of  two  random  vertors.  The  other  two  equations  follow  from  realizing 
<X,Y>2  as  the  squared  length  of  the  projection  of  a  random  point  in  on  a  random  axis, 
along  with  standard  properties  of  the  beta  distribution.  This  geometrical  information  will  be  used 
below  in  the  next  two  sections. 

5.2  Distribution  of  Norms 


Y  are  inde- 

(5.4a) 

(5.4b) 


As  earlier,  in  Section  4.3,  we  will  study  the  distributional  behavior  of  ||C||p  and  ||C||,  where 
C  is  now  a  random  Gram  matrix  of  the  form  of  Equation  (5.1),  with  the  rows  of  T  uniformly 
distributed  over  the  unit  sphere  of  appropriate  dimension. 

The  study  of  ||C||^is  greatly  facilitated  by  the  preceding  results,  since  these  imply  that  the 
square  of  each  off-diagonal  entry  of  C  has  the  beta  distribution  of  Equation  (5.2).  It  follows 
immediately  that 


E(||C||^)  =  N  +  2-  -jjj- 


N(N  -  1) 
2 


=  2N  -  1 

However,  a  variance  formula  is  not  so  immediate,  as  we  indicate  next. 
We  consider  the  second  moment  of  ||C11^ about  the  origin,  that  is. 


(5.5) 


N-l  N 

E(1|C||^=  E((N  +  2  ^  2  cj)2). 

i=l  j=i+l 


(5.6) 
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Recalling  that  the  first  two  moments  of  the  beta  distribution  B(a,b)  are  a,  (a  +  b)  and 
a(a  +  l)/(a  +  b)(a  +  b  +  1),  respectively,  we  have,  upon  expansion  of  Equation  (5.6) 

E(||C1If^)  =  N2  +  4N  ■  Tr-N(N-  l)/2 

N 


+  4 


N(N  +  2) 


N(N  -  t)/2 


(5.7) 


+  4  •  N(N  -  l)/2  [(N(N  -  l)/2)  -  1]  •  x 

where  “x”  is  a  generic  notation  for  E(Cjj2  Cjjj^),  for  i  #  k  or  j  ^  1.  Certainly  if  both  i  k  and 
j  7^  1,  then  X  =  1  /  N2,  by  independence. 

In  the  remaining  cases  we  are  in  the  following  situation.  We  have  3  random  vectors  u,  v,  w 
i.i.d.  U(S^'*)  and  we  are  considering  the  bivariate  distribution  of  (<u,v>,  <u,w>).  This  distribu¬ 
tion  has  also  been  considered  by  Stam,^®  who  gave  a  formula  for  the  density  of  the  trivariate  dis¬ 
tribution  of  (<u,v>,  <u,w>,  <v,w>).  He  also  proved  that  this  distribution  converges  in  total 
variation  to  the  standard  normal  distribution  on  R^.  In  view  of  the  complicated  nature  of  the 
aforementioned  density,  and  of  the  rather  rapid  approach  to  the  normal,  as  shown  by  simula¬ 
tions,  we  will  ignore  possible  weak  dependencies  for  small  N,  and  use  the  approximation 
X  =  1/N2  throughout  Equation  (5.7).  Therefore,  after  collecting  terms  there  we  arrive  at  the 
approximation 

E(||C|||r)-4N2-4N-  I>6  +  (5.8) 

Simulations  show  this  to  be  actually  very  accurate  for  N  ^  5.  (In  fact,  extensive  statistical  testing 
never  permits  rejection  of  the  hypothesis  that  the  variates  <u,v>  and  <u,w>  are  uncorrelated, 
for  any  N.)  Finally  we  see  that 

var  (IlCII^)  «  6 + -jq- -  2 

which,  of  course,  is  approximately  4  for  large  N. 

These  formulas  for  the  first  two  moments  of  I1C1If2  invite  comparison  with  the  correspond¬ 
ing  formulas  for  correlation  matrices  with  random  spectra  developed  in  the  preceding  section. 
While  the  means  are  very  close,  and  asymptotically  equivalent,  there  is  a  distinct  difference  in  the 
behavior  of  the  variances.  Namely,  the  variance  of  IICUp^  when  C  has  random  spectrum  varies  as 
4N,  approximately,  while  that  of  HCjlp^  when  C  is  random  Gram  is  asymptotically  constant. 

As  earlier  in  Section  4.3  we  expect  that  for  fixed  N,  HCHp  is  approximately  normal  with 
mean  \/(2N  -  I),  and  this  is  borne  out  by  simulations. 

Writing  |C||f^  in  the  form  used  in  Equation  (5.6),  and  appealing  to  the  central  limit  theo¬ 
rem,  the  asymptotic  normality  follows  readily: 
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^  ,  N2  -  2N  +  1 

||C|(t~  Normal  (2N  -  I,  4 - ) 

N2  +  N 

for  large  N.  As  in  the  earlier  section  we  could  also  establish  the  asymptotic  normality  of  ||C||p, 
but  at  this  point  that  can  be  left  to  the  interested  reader. 

We  now  want  to  turn  to  the  issue  of  the  distribution  of  the  spectral  norm  ||C||  of  a  random 
Gram  correlation  matrix  C.  This  particular  topic  brings  us  to  the  edge  of  a  large  and  active  field 
of  research  on  the  spectra  of  random  matrices;  see  for  instance  Section  II  of  the  AMS  Confer¬ 
ence  proceedings.50  This  area  has  a  long  history  as  indicated  in  the  papers  of  Girko^*  and 
Geman^2  and  their  references,  as  well  as  the  AMS  volume.  In  turn  it  relates  to  many  studies  in 
the  multivariate  statistics  field  of  spectral  behavior  of  sample  covariance  matrices,  see  for  instance 
T.  Anderson’s  book.' 

The  essential  observation  runs  as  follows.  We  have  C  =  TT*  as  defined  in  Equation  (5.1). 
Then  the  columns  of  T*  are  independent  samples  from  the  uniform  distribution  on  S'''"',  and 
hence  the  matrix 

1  1  N 

=  N  =  N  2 

k=l 

is  the  sample  second  moment  matrix  for  such  a  distribution.  (Here  tj^  is  the  kth  row  of  the 
matrix  T.)  Now  TT*  and  T*T  always  have  the  same  eigenvalues,  and  hence,  as  a  special  case, 

IlCII  =  NIISnII  .  (5.9) 

With  this  observation  we  can  now  refer  to  the  considerable  body  of  previous  work  men¬ 
tioned  in  the  preceding  references,  and  also  to  the  book  of  Watson.53  None  of  this  seems  to  be 
exactly  what  we  need.  In  particular,  it  is  unlikely  that  we  can  ever  know  the  precise  distribution 
of  IICII  for  any  fixed  N.  However,  there  arc  many  asymptotic  results.  Here  we  will  just  take  note 
of  an  improvement  of  Geman’s  theorem  by  Yin,  Bai,  and  Krishnaiah,  as  referenced  by  Yin  and 
Bai  in  Reference  50.  Namely,  let  Xp  be  a  p  X  n  random  matrix  with  i.i.d.  entries,  n  =  n(p)  an 
increasing  function  of  p  with 

lim  n(p)  =  y,  0  <  y  <  <* 

p— 00 

Suppose  that  the  entries  of  Xp  have  mean  0,  variance  o^,  and  finite  fourth  moment.  Then 
lim  II -ir  Xp  x;i|  =  (1  +  \/y)2  <t2 

p— oo 

almost  surely. 

We  will  now  advance  some  reasons  to  support  the  conjecture  that 
limllCII  =4 

N— oo 


(5.10) 


(5.11) 
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almost  surely  when  C  is  an  N  X  N  random  Gram  correlation  matrix.  First,  from  Equation  (5.9) 
we  have 

IICll  =  NIISnII  =  i|T*T|l 


where  Xp  =  •y/W  T*,  p  =  N,  and  the  columns  of  T*  are  i.i.d.  vectors  Now  the  covar¬ 

iance  matrix  of  any  such  random  vector  is  (l/N)Ij^,  where  is  the  N  X  N  identity  matrix.  [This 
can  be  seen  by  noting  that  the  uniform  distribution  is  invariant  under  unitary  transformations, 
hence  the  covariance  matrix  commutes  with  all  unitaries  and  so  must  be  a  multiple  of  Ij^.  That 
the  multiple  is  1/N  follows  from  Equation  (5.2).]  Thus  if  the  entries  of  each  column  were 
genuinely  independent  we  would  be  in  the  situation  where  the  limiting  formula  (5.10)  applied, 
with  y  =  a  =  1.  In  so  far  as  this  independence  is  present  asymptotically,  and  that  condition  is  suf¬ 
ficient,  Equation  (5.11)  would  follow. 

A  second  approach  is  to  begin  with  a  random  matrix  G  =  [gjj],  1  ^  i,  j  <  N,  with  i.i.d. 
entries  gjj,  each  a  standard  normal  variate.  By  Geman’s  theorem 

lim  II -^G  G*||  =4, 

N— oo  N 

almost  surely.  Now  let  D  be  the  N  X  N  diagonal  matrix  with  i^**  diagonal  entry  =  1/  Hi^**  col.  of 
G||,  and  set  T*  -  GD.  Then 

IICll  =N  lISfvll 

=  ||T*T||  =  (|GD2g*(| 

=  II  G(ND2)  G*|| 

N 

and,  in  so  far  as  ND^  **  for  large  N,  we  may  expect  Equation  (5.11)  to  hold.  This  approach 
is,  in  effect,  simply  a  more  rigorous  interpretation  of  the  preceding  approach,  showing  just  how 
the  lack  of  dependency  down  the  various  columns  of  C  appears.  Of  course,  there  is  a  strong  rea¬ 
son  to  believe  that  ND^  —  0  in  some  probabilistic  sense,  as  N  —  based  both  on  simulation 

for  N  ^  200,  and  on  order  statistic  analysis  beginning  with  the  distribution  of  Hi**’  col.  of  GiP  as 
chi2(N). 

Finally,  we  ran  a  brute  force  simulation  for  N  ^  100,  and  obtained  the  empirical  curve  of 
E(||C||)  against  N  shown  in  Figure  1.  Each  data  point  for  N  ^  50  is  based  on  500  trials,  while 
those  for  N  =  75,  100  are  based  on  50  trials.  The  sample  coefficient  of  variation  is  shown  next  to 
the  data  points. 

5.3  Condition  Number  Expectation 

In  Section  4.3  it  was  shown  that  the  condition  number  of  a  correlation  matrix  with  random 
spectrum  has  an  infinite  first  moment.  In  the  present  section  we  will  demonstrate  the  analogous 
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Figure  I.  Empirical  spectral  norm  of  C  -  7T* 


fact  for  random  correlation  matrices  of  Gram  type.  The  numerical  results  reported  back  in  Sec¬ 
tion  2.2  (refer  to  Table  1),  along  with  their  theoretical  result  just  mentioned,  certainly  have  pre¬ 
pared  us  for  this  fact.  Recall  that  the  key  empirical  difference  between  the  two  main  types  of 
random  correlation  matrices  was,  in  fact,  the  excessively  ill-conditioned  nature  of  random  Gram 
matrices.  We  will  discuss  some  other  aspects  of  the  spectral  behavior  of  such  matrices  in  the  next 
section. 


Theorem  Let  C  be  a  random  Gram  correlation  matrix.  Then  the  condition  number  k(C)  has 
infinite  expectation. 


I 


Proof  Making  use  of  Taylor’s  inequality  (3.15)  we  have  C  =  TT*  and 
»c(C)s  iidl  lie-' II  ^  l/min  df 
^  1/d, 2 

where  d,  =  dist(ti,Mi),  t,  =  i*!*  row  of  T,  so  each  t,  is  i.i.d.  and  M,  =  span{tj:  j  ^  i.jNow 

since  condimfM,)  =  1  in  R^,  d,  is  the  magnitude  of  the  projection  of  t,  on  the  line  M,.  Let  u,  be 
a  unit  vector  in  this  subspace;  then 

d?  =  <t|,U|>2 
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That  is,  di^  is  the  squared  length  of  a  random  point  on  a  random  direction,  and,  as  such,  it  has 
the  distribution  of  Equation  (5.2),  with  moments  given  by  Equation  (5.4).  See  also  Reference  10. 
Therefore, 

E(<c(C))^  E(l/d?) 

N  -3 

2  2 
t  (1-t)  dt 


and  this  integral  clearly  diverges  at  0.  QED. 

5.4  Empirical  Spectral  Behavior 

This  final  section  addresses  the  question  “How  random  is  the  spectrum  of  a  random  Gram 
matrix?”.  Now,  in  one  sense,  this  question  has  already  been  answered  by  the  results  of  Sections 
4.3  and  5.2.  Namely,  in  those  sections  we  derived  the  behavior  of  ||C||p2,  ||C||p,  and  j(C||  for 
both  types  of  random  correlation  matrices.  Expressing  these  functions  of  C  in  terms  of  the  eigen¬ 
values  shows  that,  not  all  spectral  functions  behave  the  same,  and  in  particular,  that  random 
Gram  matrices  do  not  have  a  random  spectrum.  Below,  we  will  briefly  discuss  some  other  aspects 
of  this  question. 

Let  us  begin  by  considering  the  behavior  of  the  smallest  eigenvalue  Ajsj  of  an  N  X  N  random 
Gram  matrix  C.  As  in  Section  5.2  there  exists  some  related  theoretical  work  in  the  literature 
(e.g.,  Reference  54),  but  again,  it  is  not  directly  applicable  to  our  situation.  In  view  of  the  appar¬ 
ent  boundedness  of  1|C||  as  N—oo,  and  of  the  simulation  results  from  Section  2.2  which  suggest 
that  random  Gram  matrices  are  much  more  ill-conditioned  than  matrices  with  random  spectrum, 
we  might  expect  A  to  be  much  smaller  than  the  corresponding  value  from  a  correlation  matrix 
with  random  spectrum.  The  distribution  function  of  the  latter  is 

F(x)=  1  -(1 -x)N-l  0<x<l 

as  follows  from  Equation  (4.4),  and  we  have 

E(Amin)^  =  1/N2  «  var(A„in) 

as  earlier  noted. 

Since  we  have,  under  the  null  hypothesis  that  the  spectral  behavior  of  C  is  random,  the 
exact  distribution  and  first  two  moments  of  Aj^,  we  can  test  this  hypothesis  by  a  variety  of 
standard  statistical  tests.  In  particular,  using  the  Kolmogorov-Smirnov  one-sample  test  with  100 
simulated  random  Gram  matrices  of  various  orders  (N  ^  20),  we  are  led  to  decisively  reject  the 
null  hypothesis,  at  the  99  percent  level. 

Instead  of  considering  the  extreme  eigenvalues  A,,  A^  of  C,  we  can  inquire  about  the  behav¬ 
ior,  in  some  suitable  sense,  of  the  entire  spectrum  ct(C).  For  example,  we  have  already  looked  at 
the  statistic 
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N 

l|C|i^= 

1 

and  noted  that  asymptotically  its  mean  behavior  is  that  of  a  matrix  with  random  spectrum,  but 
its  second  moment  behavior  is  quite  different.  Another  approach  is  along  the  following  lines. 
Beginning  with  the  N  X  N  random  Gram  matrix  C,  and  its  spectrum  ct(C)  =  |Xi,  .  .  .  ,  we 
can  invert  the  spacings  method  of  Section  4.1  to  arrive  at  a  sample  of  points  jX],  .  .  .  ,  X]>4.|j  in 
the  unit  interval.  The  transformation  is 

1 

*k  =  ^N-l  ■^n) 

for  1  ^  k  ^  N  -  1.  Under  the  null  hypothesis  this  set  of  x's  is  a  sample  from  the  uniform  distri¬ 
bution,  and  various  statistics  computed  from  this  sample  can  be  used  for  a  test. 

As  an  example,  we  considered  Neyman’s  test^^  for  uniformity.  Fixing  N,  we  generated  a 
batch  of  1000  random  Gram  matrices  C,  obtained  their  spectrum  and  the  resulting  points 
{X|,  .  .  .  ,Xn.i}  in  [0,1].  Then  Neyman’s  sutisti, 

+  vl 

was  computed,  where  the  vj  are  the  sample  Fourier-Legendre  coefficients  when  the  density  func¬ 
tion  f  (from  which  the  x’s  are  drawn)  is  expanded  in  terms  of  Legendre  polynomials; 

f(x)  =  c  exp(l  +  2  Cj  Lj(x)). 

The  motivation  and  theory  of  this  test  is  discussed  in  the  reference,  and  will  not  be  given  here. 
The  distribution  of  is  known  approximately,  and  is  asymptotically  chi2(2).  The  null  hypothesis 
is  to  be  rejected  for  large  values  of  N|.  For  each  N  we  calculated  the  fraction  of  the  1000  sam¬ 
ples  that  exceeded  various  percentage  points  of  the  distribution,  with  the  results  indicated  in 
Table  2.  It  is  evident  from  these  figures  and  the  large  number  of  trials  that  the  null  hypothesis  of 
uniformity  must  be  rejected.  A  closer  examination  of  the  data  reveals  that  not  only  is  there  a 
very  small  eigenvalue  noted  above,  but  in  fact  there  are  enough  small  eigenvalues  to  pull 

the  sample  mean  x  far  enough  below  0.5  to  greatly  inflate  the  value  of  Vj  (precisely, 

v,  =  \/(12n)(x  -  y) 

where  n  =  sample  size  =  1000,  here).  Incidentally,  the  sample  coefficient  of  variation  of  the  Ney- 
man  statistics  decreased  steadily  from  0.27  at  N  =  5  to  0.045  at  N  =  30,  showing  very  little  scatter 
about  the  increasingly  large  values  of  N^. 

Finally,  we  offer  two  comments  about  the  empirical  behavior  of  the  condition  number  of 
random  Gram  matrices.  First,  for  various  N(  ^  20)  we  generated  batches  of  1000  each  of  random 
Gram  matrices  and  correlation  matrices  with  random  spectra,  and  performed  a  Kolmogorov- 
Smirnov  two-sample  test  on  the  respective  condition  numbers,  to  test  the  null  hypothesis  of  a 
common  distribution.  This  hypothesis  was  decisively  rejected  for  all  values  of  N,  this  rejection 
continued  when  the  samples  were  subjected  to  trimming. 
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Second,  bearing  in  mind  the  condition  number  bounds  established  in  Section  3.2,  we  studied 
by  simulation  the  tightness  of  the  upper  bound  (3.18).  That  is,  for  various  N(  ^  50)  we  generated 
batches  of  random  Gram  matrices,  computed  their  condition  numbers,  the  co-linearity  measure 
on  the  right  hand  side  of  Equation  (3.15),  and  then  their  ratio  as  in  Equation  (3.18).  The  results 
are  displayed  in  Table  3.  They  suggest  that  the  admittedly  crude  upper  bound  in  Equation  (3.18) 
can  indeed  be  reduced,  and  perh;ips  even  be  replaced  by  a  term  that  is  of  order  o(N). 


TABLE  2 

Fraction  of  Neyman  Statistics  Exceeding  Various 
Percentage  Points,  and  Sample  Average 

N  (percent) 

50 

90 

95 

Mean 

N22 

5 

99.8 

45. 

10.5 

4.3 

8 

100. 

98.5 

84.4 

7.0 

10 

— 

100. 

99.5 

8.9 

15 

— 

— 

100. 

13.3 

20 

— 

— 

— 

17.7 

30 

— 

— 

— 

26.5 

TABLE  3 

Empirical  Ratio  of  Condition  Number  of  Colinearity 
Measure  for  Random  Gram  Matrices 

N 

Batch  Size 

Sample 

Mean  Ratio 

Sample 

Coeff.  of  Var. 

5 

1000 

5.20 

.21 

10 

1000 

8.82 

.25 

20 

100 

15.86 

.24 

35 

100 

22.65 

.24 

50 

100 

29.28 

.22 
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6.  SUMMARY 


Let  us  summarize  not  only  the  foregoing  technicalities,  but  also  the  place  of  this  material  in 
a  larger  scheme.  In  addition,  we  will  point  out  several  issues  that  remain  to  be  resolved. 

As  noted  at  the  outset,  our  interest  in  random  correlation  matrices  stems  from  their  interpre¬ 
tation  as  covariance  matrices  of  purely  random  or  ‘average’  (standardized)  signals.  A  research 
project  now  underway  has  as  its  goal  the  evaluation  of  the  efficacy  of  various  group-theoretic  sig¬ 
nal  processing  algorithms.  One  ingredient  that  must  be  specified  before  a  well-defined  question 
can  be  posed  in  this  context  is  a  definite  signal  model.  As  remarked  in  the  Introduction,  such 
models  can  either  be  defined  by  a  few  (typically  ^  2)  parameters,  or  they  can  be  essentially  non- 
parametric.  A  further  possible  subdivision  of  this  latter  class  is  into  random  stationary  signals,  or 
into  purely  random  signals.  The  corresponding  covariance  matrices  are  then  random  correlation 
matrices  with,  in  the  first  case,  a  Toeplitz  structure.  We  have  not  discussed  such  special  random 
matrices  because  it  appears  that,  for  practical  purposes,  most  such  behavior  can  be  at  least 
approximated  by,  for  example,  varying  the  parameters  in  an  AR(2)  signal  model.  Nevertheless, 
the  question  of  generating  random  Toeplitz  correlation  matrices,  and  the  statistical  behavior  of 
the  corresponding  entries,  spectral  functions,  etc.,  is  interesting,  and  is  being  studied,  with  results 
to  be  reported  elsewhere. 

We  therefore  have  chosen  to  concentrate  on  random  correlation  matrices  of  the  two  princi¬ 
pal  types  defined  in  Section  1.2,  and  studied  in  detail  in  Sections  2,  4,  and  5.  We  observed  early 
on  that  random  Gram  matrices  exhibited  a  comparatively  wild  spectral  behavior  relative  to  corre¬ 
lation  matrices  with  random  spectrum.  As  we  discovered  later,  this  behavior  is  due  to  the  pres¬ 
ence,  on  average,  of  several  much  smaller  eigenvalues  than  is  consistent  with  the  hypothesis  of  a 
random  spectrum.  In  fact,  a  variety  of  both  theoretical  and  empirical  results  shows  that  random 
Gram  matrices  do  not  have  random  spectrum;  these  are  reviewed  in  Section  5.4. 

In  addition  to  collecting  together  numerous  known  results  from  the  general  statistics  litera¬ 
ture,  and  interpreting  them  in  the  present  context  of  random  correlation  matrices,  we  have  devel¬ 
oped  some  new  theoretical  results.  Specifically,  in  Section  3.2  we  have  extended  the  earlier  work 
of  J.  Taylor  on  condition  number  lower  bounds,  and  we  assessed  their  tightness.  This  result  is 
strictly  deterministic.  We  then  used  this  bound  to  show  that  the  condition  number  of  random 
Gram  matrices  (of  a  fixed  size)  has  an  infinite  first  moment.  In  view  of  our  earlier  empirical 
observations,  this  conclusion  was  not  a  complete  surprise.  Yet  it  also  turned  out  that  correlation 
matrices  with  random  spectrum  also  have  infinite  first  moment  (for  each  fixed  dimension  N  ^  2, 
the  case  N  =  2  being  due  to  W.  Feller). 

We  might  offer  an  additional  comment  on  the  condition  of  random  Gram  matrices.  Namely, 
referring  back  to  the  basic  definition  (Section  5.1),  we  could  allow  the  row  vectors  t,  there  to  be 
drawn  randomly  from  the  unit  sphere  in  a  larger  dimensional  space.  Geometric  intuition  suggests 
that  with  more  ‘room’  in  the  sample  space,  collinearity  should  be  less  of  a  problem,  with  conse¬ 
quent  improvement  in  conditioning.  Numerical  experiments  show  that,  to  an  extent,  this  expecta¬ 
tion  is  fulfilled.  For  example,  in  constrast  with  the  data  reported  in  Table  1,  the  mean  (respective 
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median)  condition  number  of  500  5  X  5  random  Gram  matrices  based  on  vectors  drawn  uni¬ 
formly  from  the  sphere  is  11.6  (respectively  8.7).  The  corresponding  values  for  500  10  X  10 
random  Gram  matrices  based  on  vectors  drawn  from  are  17.5  (respectively  15.2).  However,  it 
is  not  yet  clear  whether  such  higher  dimensional  random  Gram  matrices  have  random  spectrum 
(eventually  perhaps,  but  not  initially!),  or  whether  they  have  a  finite  first  moment.  This  appears 
to  be  an  attractive  research  area. 

Finally,  we  note  that  some  unresolved  issues  remain.  In  addition  to  specific  technical  ques¬ 
tions,  such  as  the  best  bound  in  Equation  (3.18)  (that  is,  the  exact  relation  between  the  condition 
number  of  a  Gram  matrix  and  its  Taylor  lower  bound),  and  validation  of  the  conjectured  limit 
of  the  spectral  norm  of  large  random  Gram  matrices  in  Equation  (5.11),  there  is  the  issue  of  the 
intended  application  of  these  random  correlation  matrices  to  specific  statistical  testing  procedures 
and  simulations.  In  the  present  case  the  area  of  interest  was  described  in  the  introduction.  These 
matrices  will  serve  to  model  random  signals,  the  latter  in  turn  serving  as  inputs  to  various  signal 
processors  defined  by  group  filters  and  transforms,  the  objective  being  to  assess  the  relative  value 
of  different  groups  of  a  common  order  (especially  N  =  2")  for  specific  signal  processing  tasks 
such  as  Wiener  filtering,  decorrelation,  data  compression,  etc.  However,  the  question  of  whether 
one  type  of  random  correlation  matrix  should  be  preferred  to  another,  for  this  particular  applica¬ 
tions,  remains  to  be  settled. 
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